1 Project Description

1.1 General Overview:

  • People express their character identification in unique ways and the most important manner is apparel. Someone’s self-notion is pondered via his clothing and brand preferences, and it indicates how a person would like to be.

  • In the last few decades and as a result, many national and international brands have evolved in Pakistan.

  • Entrepreneurs make use of brands as the principal factor of differentiation to the advantage of aggressive benefit on different competition, gambling being an imperative function in the triumph of companies [1].

  • Clothes today are made from a wide range of different materials. Traditional materials such as cotton, linen and leather are still sourced from plants and animals [2].

[1] Kamran, A., Dawood, M. U., Rafi, S. K., Butt, F. M., & Akhtar, K. (2020). Impact of Brand Name on Purchase Intention: A Study on Clothing in Karachi, Pakistan. International Journal of Innovation, Creativity and Change, 278-293.

[2] Objective, C. (2021, December 10). What Are Our Clothes Made From? Retrieved from https://www.commonobjective.co/article/what-are-our-clothes-made-from


1.2 General Problem Statement:

Alt Text


2 EDA of the project

2.1 Step1 - Import modules


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from sklearn.impute import SimpleImputer

2.2 Step2: Import data

df1=pd.read_excel('/Users/snawaz/Documents/pychilla2/teamproject_sep3/Deep_note_linked/Sales.xlsx')

2.3 Step3: Make a copy of the data

df=df1.copy()

2.4 Step4: Check shape oF Data

rows,cols=df.shape
print("Number of rows in dataset are",rows)
## Number of rows in dataset are 670082
print("Number of columns in the dataset are",cols)
## Number of columns in the dataset are 45

3 EDA of the project

3.1 Step 5: Check the data types of the columns

df.info()
## <class 'pandas.core.frame.DataFrame'>
## RangeIndex: 670082 entries, 0 to 670081
## Data columns (total 45 columns):
##  #   Column             Non-Null Count   Dtype         
## ---  ------             --------------   -----         
##  0   BillNo             670082 non-null  object        
##  1   BillDate           670082 non-null  datetime64[ns]
##  2   LoyaltyCard        33720 non-null   object        
##  3   Customer           661558 non-null  object        
##  4   Description        66014 non-null   object        
##  5   BillMonth          670082 non-null  object        
##  6   Warehouse          670082 non-null  object        
##  7   RegionName         670082 non-null  object        
##  8   Location           670082 non-null  object        
##  9   Category           670082 non-null  object        
##  10  DepartmentName     670082 non-null  object        
##  11  BrandName          668530 non-null  object        
##  12  CoBrand            670082 non-null  object        
##  13  Barcode            670082 non-null  int64         
##  14  DesignNo           670082 non-null  object        
##  15  Rejection          670082 non-null  object        
##  16  SeasonName         670082 non-null  object        
##  17  Attribute1         670057 non-null  object        
##  18  Attribute2         143125 non-null  object        
##  19  Attribute3         309262 non-null  object        
##  20  Attribute4         670082 non-null  object        
##  21  Attribute5         337322 non-null  object        
##  22  Attribute6         0 non-null       float64       
##  23  Attribute7         0 non-null       float64       
##  24  Attribute8         490234 non-null  object        
##  25  LocalImport        670082 non-null  object        
##  26  Color              670082 non-null  object        
##  27  Sizes              670082 non-null  object        
##  28  DiscountType       502886 non-null  object        
##  29  SalesmanName       670082 non-null  object        
##  30  Qty                670082 non-null  int64         
##  31  SalesReturnReason  14786 non-null   object        
##  32  Price              670082 non-null  int64         
##  33  Amount             670082 non-null  int64         
##  34  SaleExclGST        670082 non-null  float64       
##  35  GSTP               670082 non-null  int64         
##  36  GST                670082 non-null  int64         
##  37  DiscPer            670082 non-null  float64       
##  38  DiscAmount         670082 non-null  float64       
##  39  BarcodeDiscPer     670082 non-null  int64         
##  40  BarcodeDiscount    670082 non-null  int64         
##  41  NetAmount          670082 non-null  float64       
##  42  PointsEarned       670082 non-null  int64         
##  43  TaxPer             670082 non-null  int64         
##  44  Cobrand Acc        670082 non-null  object        
## dtypes: datetime64[ns](1), float64(6), int64(10), object(28)
## memory usage: 230.1+ MB

3.2 Step 6: Check the first 5 rows of the data

DT::datatable(head(py$df,20), options = list(pageLength = 5,scrollX=T))

4 EDA of the project

4.1 Step 7: Check the unique values of the data

for column in df.columns : 
    print('Number of unique data for {0} is {1}'.format(column , len(df[column].unique())))
    print('unique data for {0} is {1}'.format(column , df[column].unique()))
    print('=====================================')
## Number of unique data for BillNo is 439665
## unique data for BillNo is ['SALM-010116-00003' 'SALM-010116-00009' 'SALM-010116-00011' ...
##  'SDMT-241218-00112' 'SDMT-271218-00017' 'SDMT-271218-00018']
## =====================================
## Number of unique data for BillDate is 1089
## unique data for BillDate is ['2016-01-01T00:00:00.000000000' '2016-02-01T00:00:00.000000000'
##  '2016-03-01T00:00:00.000000000' ... '2018-12-23T00:00:00.000000000'
##  '2018-12-25T00:00:00.000000000' '2018-12-30T00:00:00.000000000']
## =====================================
## Number of unique data for LoyaltyCard is 5344
## unique data for LoyaltyCard is [nan 12 26 ... 32356 32370 32433]
## =====================================
## Number of unique data for Customer is 73677
## unique data for Customer is ['.' 'Mr.Shoaib sadiique' 'Mr.Adil' ... ' M HAYAT KHAN ' ' MRS IMRAN '
##  ' AIMEN IBRAHIM ']
## =====================================
## Number of unique data for Description is 11788
## unique data for Description is [nan 'Exchnage' 'Exchange' ... 'EMP-2261' ' I.D 2261' 'Blazer']
## =====================================
## Number of unique data for BillMonth is 36
## unique data for BillMonth is ['2016-01' '2016-02' '2016-03' '2016-04' '2016-05' '2016-06' '2016-07'
##  '2016-08' '2016-09' '2016-10' '2016-11' '2016-12' '2017-03' '2017-01'
##  '2017-02' '2017-05' '2017-04' '2017-06' '2017-07' '2017-08' '2017-09'
##  '2017-11' '2017-12' '2017-10' '2018-02' '2018-01' '2018-08' '2018-06'
##  '2018-07' '2018-04' '2018-03' '2018-05' '2018-11' '2018-10' '2018-09'
##  '2018-12']
## =====================================
## Number of unique data for Warehouse is 2
## unique data for Warehouse is ['No' 'Yes']
## =====================================
## Number of unique data for RegionName is 7
## unique data for RegionName is ['3-NORTH ' '1-KARACHI' '5-WAREHOUSES' '2-LAHORE' '4-CENTRAL PUNJAB '
##  '7- EXCLUDE SUB STORES' '6-SUB STORE ']
## =====================================
## Number of unique data for Location is 67
## unique data for Location is ['ALAM STORE' 'AL SAEED SUPER STORE ' 'TCS Atrium Mall'
##  'SANA ENTERPRISES ' 'TCS BEVERLY CENTER'
##  'TCS CLASSIC DEPARTMENTAL STORE (CLOSED)'
##  'CG Main Store  / A-5 2nd floor (CLOSED)' 'TCS DOLMEN CITY '
##  'TCS Dolmen Mall -Hyderi' 'TCS Dolmen Mall - Tariq Road'
##  'ENEM ENTERPRISES (CLOSED)' 'EXHIBITION ' 'TCS Fountain-Avenue-Lhr'
##  'TCS The Forum ' 'TCS FORTRESS SQUARE ' 'GALAXY PLUS'
##  'TCS HYDERI BLOCK-H ' 'HKB DEFENCE (CLOSED)' 'HKB LIBERTY '
##  'GULGASHT TOWN - MULTAN' 'HASSAN ENTERPRISES (CLOSED)'
##  'TCS KINGS MALL GUJRANWALA' 'TCS Chen-One-Tower- (CLOSED)'
##  'CAMBRIDGE ONLINE STORE' 'TCS BAHADURABAD ' 'TCS GULSHAN'
##  'TCS Park Tower - 2' 'RAJA SAHIB LINK ROAD' 'RAJA SAHIB LIBERTY'
##  'TCS Sialkot Cantt' 'SANAULLA & CO (CLOSED)' 'SANA VENTURE (CLOSED)'
##  'SANA STYLE ' 'THE SHOPPE-2' 'TCS MODERN SADDAR HYD' 'TCS WADUD SONS'
##  'TCS Z BLOCK DHA' 'TCS RCG MALL - FAISALABAD'
##  'RAJA SAHIB - WAPDA TOWN (CLOSED)' 'TCS SAFA GOLD MALL - G.FLR'
##  'ZEEN ONLINE STORE' 'TCS IDREES BOOK ZEEN- RWP' 'A-5 LOOSE WAREHOUSE '
##  'TCS AMANAH MALL - LAHORE' 'TCS LUCKY ONE - CAMBRIDGE'
##  'TCS PACKAGES MALL - LAHORE' 'TCS GIGA MALL-WTC-Cambridge'
##  'TCS LUCKY ONE - ZEEN' 'GALAXY PLUS 1 (M.IRFAN)'
##  'KORANGI EDHI STORE (PRESS DEPARTMENT)'
##  'TCS NISHAT EMPORIUM - 2 (Cambridge)' 'TCS GIGA MALL-WTC-Zeen'
##  'A-5 LOOSE WAREHOUSE (GROUND FLOOR)' 'B-53 LOOSE WAREHOUSE '
##  'DO BURJ - FSD - ZEEN' 'TCS Zeen Dolmen Mall Hyderi'
##  'TCS ZEEN DOLMEN CITY' 'GOJRA - ZEEN  (RAFIQ CENTRE)'
##  'HKB DEFENCE - Y-BLOCK' 'TCS NISHAT EMPORIUM - ZEEN' 'RCG MALL - ZEEN'
##  'SIALKOT (2) - ZEEN' 'TCS SAFA MALL - ZEEN' 'TCS ZEEN ATRIUM MALL '
##  'TCS Y - BLOCK  - LAHORE' 'TCS ZEEN DOLMEN TARIQ ROAD'
##  'TCS GUJRANWALA SATELLITE TOWN']
## =====================================
## Number of unique data for Category is 1
## unique data for Category is ['MENS SHIRT              ']
## =====================================
## Number of unique data for DepartmentName is 2
## unique data for DepartmentName is ['LICENSE ' 'CAMBRIDGE ']
## =====================================
## Number of unique data for BrandName is 19
## unique data for BrandName is ['LICENSE FULL SLEEVE ' 'LUXER' 'EXECUTIVE ' 'PRINCIPLE SHIRT '
##  'CAMBRIDGE CASUAL ' 'PORT FOLIO SHIRT ' 'CAMBRIDGE Since 1958'
##  'ARISTO SHIRT' 'TOMORROW ' 'LICENSE HALF SLEEVE ' 'DESIGN STUDIO '
##  'CAMBRIDGE HALF SLEEVE ' 'AFTER HOURS ' 'PRIVILEGE SHIRT ' nan
##  'CAMBRIDGE FULL SLEEVE ' 'ACTIVE SHIRT ' 'PERSONALLY CAMBRIDGE'
##  'ZERO TOLERANCE ']
## =====================================
## Number of unique data for CoBrand is 107
## unique data for CoBrand is ['LICENSE F/S              ' 'LUXER PLAIN F/S' 'Cambridge Executive F/S'
##  'PRINCIPLE PLAIN F/S      ' 'PRINCIPLE CLASSIC F/S    ' 'LUXER F/S'
##  'PRINCIPLE SWAN F/S' 'CAMBRIDGE CASUAL F/S' 'PORT FOLIO (SHADES) F/S '
##  'PORT FOLIO F/S  WCB      ' 'PORT FOLIO SHIRT YARN DYED' 'NO IRON EVER'
##  'DENIM F/S ' 'LUXER PLAIN WCB F/S' 'CAMBRIDGE SINCE 1958' 'ARISTO '
##  'EXECUTIVE H/S ' 'PRINCIPLE CLASSIC H/S' 'ARISTO MASON' 'OVERDYE F/S '
##  'LUXER H/S' 'TOMORROW F/S             ' 'LICENSE H/S              '
##  'PRINCIPLE POPLIN LUXE MILANO F/S ' 'PRINTED F/S SHIRTS'
##  'PRINCIPLE Y/DYED LUXE MILANO F/S ' 'PORT FOLIO Y/D H/S'
##  'D STUDIO  DESIGNERS SHIRT' 'CAMBRIDGE CASUAL H/S ' 'OXFORD H/S'
##  'TOMORROW H/S             ' 'AFTER HOURS F/S          ' 'CHEMBREY F/S '
##  'PRINCIPLE OXFORD F/S ' 'CHEMBREY H/S' 'AGE OF WISDOM F/S '
##  'PORT FOLIO H/S WCB       ' 'PRIVILEGE F/S ' 'SEERSUCKER F/S SHIRT'
##  'LIGHT WEIGHT H/S ' 'LIGHT WEIGHT F/S ' 'PORTFOLIO SATEEN F/S '
##  'DEAD SHIRT F/S ' 'ITALIAN' 'AFTER HOURS H/S          '
##  'SEERSUCKER H/S SHIRT' 'LICENSE                  ' 'OXFORD LICENSE F/S'
##  'HERRING BONE F/S' 'MELANGE YARN DYED F/S ' 'FLANNEL F/S '
##  'SHARP CAMBRDIGE ' 'CAMBRIDGE SHIRT' 'DOBBY F/S' 'Cotton Linen F/S '
##  'ESSENTIALS MENS FORMAL SHIRTS' 'Cotton Slub F/S' 'OXFORD LICENSE H/S'
##  'ACTIVE F/S' 'PERSONALLY CAMBRIDGE' 'D STUDIO F/S             '
##  'ZERO TOLERANCE F/S       ' 'F/S DOBBY' 'F/S HERRING BONE '
##  'F/S MELANGE YARN DYED ' 'F/S LICENSE           '
##  'F/S CAMBRIDGE EXECUTIVE ' 'F/S LUXER PLAIN WCB ' 'F/S LUXER YARN DYED '
##  'F/S ESSENTIALS MENS FORMAL SHIRTS' 'F/S ARISTO ' 'F/S ARISTO MASON'
##  'F/S NO IRON EVER' 'F/S LUXER PLAIN ' 'F/S PRINCIPLE CLASSIC YARN DYED'
##  'F/S DENIM ' 'H/S LICENSE         ' 'H/S OXFORD LICENSE ' 'H/S LUXER'
##  'F/S SHARP CAMBRDIGE ' 'H/S EXECUTIVE' 'F/S OVERDYE '
##  'F/S OXFORD LICENSE ' 'F/S PRINCIPLE PLAIN    ' 'F/S COTTON LINEN '
##  'H/S PRINCIPLE CLASSIC ' 'F/S PRINCIPLE POPLIN LUXE MILANO'
##  'F/S DEAD SHIRT ' 'F/S PORT FOLIO SHIRT YARN DYED' 'F/S PRINTED SHIRTS'
##  'H/S SEERSUCKER SHIRT' 'F/S PRINCIPLE SWAN ' 'F/S CHEMBREY'
##  'H/S CHEMBREY ' 'F/S CAMBRIDGE SINCE 1958' 'F/S COTTON SLUB '
##  'F/S AFTER HOURS         ' 'F/S SEERSUCKER SHIRT' 'F/S FLANNEL'
##  'F/S AGE OF WISDOM ' 'CAMBRIDGE UNIFORM ' 'F/S LICENSE                  '
##  'F/S LIGHT WEIGHT ' 'F/S TOMORROW          ' 'F/S CAMBRIDGE CASUAL'
##  'F/S PRINCIPLE Y/DYED LUXE MILANO ' 'H/S PORT FOLIO Y/D ']
## =====================================
## Number of unique data for Barcode is 25740
## unique data for Barcode is [492146 464028 464945 ... 591320 597473 577000]
## =====================================
## Number of unique data for DesignNo is 5881
## unique data for DesignNo is ['B5393' 'B6335' 'B6336' ... 'B10767' 'B10728' 'BU321']
## =====================================
## Number of unique data for Rejection is 2
## unique data for Rejection is ['No' 'Yes']
## =====================================
## Number of unique data for SeasonName is 31
## unique data for SeasonName is ['WINTER 2015 - 2016' 'SUMMER 2015' 'SUMMER 2016' 'SUMMER 2014'
##  'SUMMER 2011' 'WINTER 2014 - 2015' 'EID FESTIVAL 2015'
##  'EID FESTIVAL 2013' 'WINTER 2013 - 2014' 'WINTER 2010 - 2011'
##  'WINTER 2012 - 2013' 'SUMMER 2013' 'EID FESTIVAL 2014'
##  'WINTER 2011 - 2012 ' 'Opening' 'SUMMER 2012' 'EID FESTIVAL 2012'
##  'EID FESTIVAL 2011' 'SUMMER 2010' 'EID FESTIVAL 2016' 'WINTER 2016-2017'
##  'EID AL ADHA 2016' 'SUMMER 2017' 'EID FESTIVAL 2010' 'WINTER 2017 - 2018'
##  'EID FESTIVAL 2017' 'EID AL ADHA 2017' 'SUMMER 2018' 'EID FESTIVAL 2018'
##  'WINTER 2018 - 2019' 'EID AL ADHA 2018']
## =====================================
## Number of unique data for Attribute1 is 13
## unique data for Attribute1 is ['1 Year & Above (Discounted)' 'No Stock' 'Winter Stock' 'Obsolete'
##  'Rejection' 'B Category' 'Active (Fresh)' 'Cut Range Items '
##  'Summer Sale ' 'Summer Hold Stock ' 'WINTER ACTIVE ' 'WINTER DISCOUNTED '
##  nan]
## =====================================
## Number of unique data for Attribute2 is 6
## unique data for Attribute2 is ['Buy 1 Get 1 free' nan 'Regular Fit' 'Slim Fit' 'Comfort fit'
##  'Modern fit']
## =====================================
## Number of unique data for Attribute3 is 4
## unique data for Attribute3 is [nan 'PAKISTAN ' 'IMPORT-S' 'CHINA ']
## =====================================
## Number of unique data for Attribute4 is 1
## unique data for Attribute4 is ['CAMBRIDGE']
## =====================================
## Number of unique data for Attribute5 is 3
## unique data for Attribute5 is [nan 'A' 'B']
## =====================================
## Number of unique data for Attribute6 is 1
## unique data for Attribute6 is [nan]
## =====================================
## Number of unique data for Attribute7 is 1
## unique data for Attribute7 is [nan]
## =====================================
## Number of unique data for Attribute8 is 4
## unique data for Attribute8 is [nan 'Regular' 'Premium' 0]
## =====================================
## Number of unique data for LocalImport is 2
## unique data for LocalImport is ['Local' 'Import']
## =====================================
## Number of unique data for Color is 388
## unique data for Color is ['Forest Teal' 'Pool Blue' 'L/GREEN        ' 'BLUE           '
##  'MEHROON        ' 'MIX            ' 'L/BLUE         ' 'Ultra Voilet'
##  'PINK/WHITE     ' 'DEEP MELON' 'NUGGET' 'YELLOW         ' 'BLACK Plaid'
##  'WHITE          ' 'SKY BLUE       ' 'L/GREY         ' 'TURQ '
##  'GREY           ' 'BROWN/BLUE     ' 'RED            ' 'Red/white'
##  'Crystal Blue' 'CREAM          ' 'M/BLUE         ' 'STONE          '
##  'MAROON         ' 'PURPLE         ' 'D/GREY         ' 'ROSE DAWN'
##  'NAVY           ' 'L/PINK         ' 'ROYAL BLUE     ' 'WHITE/BLUE     '
##  'BLACK          ' 'PURPLE/BLACK   ' 'WHITE/RED      ' 'BLUE/WHITE     '
##  'NAVY STRIP     ' 'OFF WHITE      ' 'Mushroom       ' 'BLUE/NAVY'
##  'PINK           ' 'PURPLE STRIPE  ' 'GREY/WHITE     ' 'BLUE/GREY      '
##  'VIOLET         ' 'LILAC          ' 'Sea Green      ' 'WHITE/BROWN    '
##  'RED/YELLOW' 'J/BLACK        ' 'GREY/PINK      ' 'BLUE/BROWN     '
##  'FRENCH BLUE' 'BLUE/RED       ' 'BLACK/WHITE    ' 'INK BLUE       '
##  'GOLDEN         ' 'GREY2' 'BROWN          ' 'BLUE/GREEN     '
##  'White/Black    ' 'D/BLUE         ' 'OCEAN ' 'OLIVE          '
##  'NAVY BLUE      ' 'Blue/LILAC' 'VAPOR GREY ' 'RED/BLUE            '
##  'WHITE/GREY     ' 'ORANGE         ' 'Multi ' 'African Violet'
##  'WHITE/PINK' 'WHITE/PURPLE          ' 'PINK/BLUE' 'Grey/Blue      '
##  'BLUE/PURPLE' 'GREEN          ' 'RED PLAID' 'Night Shade Blue'
##  'NAVY/RED       ' 'KHAKI          ' 'Burgundy       ' 'BLACK/BLUE     '
##  'BLUE/YELLOW    ' 'CHARCOAL       ' 'RED STRIPE     ' 'GREY/GREEN     '
##  'BLUE/BLACK     ' 'RED WOOD           ' 'PEARL PINK ' 'Military Green'
##  'GREEN/BLUE     ' 'BANANA GREEN' 'WHITE/NAVY' 'PLAE RED' 'FALL LEAF'
##  'BEIGE          ' 'L/PURPLE       ' 'WHITE/MEHROON' 'WHITE1         '
##  'PURPLE/WHITE   ' 'OPTICAL/WHITE  ' 'BLACK/PURPLE   ' 'YELLOW/PURPLE'
##  'FAWN           ' 'BIKING RED' 'NAVY/WHITE     ' 'L/YELLOW '
##  'Grey/Black     ' 'BROWN/WHITE    ' 'Tea Pink       ' 'AQUA           '
##  'RIVER BLUE     ' 'Wild Ginger' 'RED BUD' 'BLUE/PINK      '
##  'FRESH SALMON' 'CREAM1' 'Ultramarine' 'TURKISH TILE' 'D/PURPLE       '
##  'BLACK/RED      ' 'BEIGE/BLUE     ' 'GREY/RED' 'ORANGE/GREY    '
##  'PEACH          ' 'WHITE/GOLDEN   ' 'Carrel' 'ORANGE STRIPE '
##  'Yellow/White' 'WHITE STRIPE   ' 'Blue check     ' 'Turquoise' 'Red/Navy'
##  'Sky blue/Black' 'Blue Strip     ' 'D/GREEN        ' 'L/BROWN        '
##  'INDIGO         ' 'ROSE CLOUD     ' 'RUST           ' 'D/Mehroon      '
##  'Green Stripes  ' 'AQUA/WHITE' 'PURPLE/GREY    ' 'NAVY/GREEN     '
##  'ROYAL          ' 'Orange Com' 'LEMON          ' 'BLACK/NAVY'
##  'C/BLUE         ' 'Oxford tan' 'AQUA BLUE' 'Maroon/Navy' 'Aqua Gray'
##  'RED/BLACK      ' 'WHITE/ORANGE   ' 'SEA BLUE       ' 'R/BLUE         '
##  'BLUE/RUST      ' 'Black/Grey     ' 'WINE' 'NAVY/GREY' 'Palace Blue'
##  'MEHNDI         ' 'HIGH RISE' 'SHADOW PURPLE ' 'ECRU OLIVE     '
##  'PINK/GREY      ' 'BROWN/GREY     ' 'NAVY/BEIGE' 'PASTEL BLUE'
##  'BEIGE/WHITE    ' 'WHITE/GREEN    ' 'Brown/Black    ' 'OLIVE/GREY     '
##  'SHADOW GREY' 'BLUE/ORANGE    ' 'MONUMENT' 'FAWN/GREY      '
##  'YELLOW/BLUE    ' 'QUICK SILVER ' 'Black/Green' 'C/SEA' 'BLACK/SILVER   '
##  'Sky blue/Navy' 'IRON' 'Ballad Blue ' 'N/BLUE         ' 'BLACK1         '
##  'WHITE CHECK' 'STAR GAZER' 'SAND           ' 'NAVY/INDIGO    '
##  'Red Check      ' 'RED/GREY' 'OLD NAVY       ' 'NAVY/YELLOW'
##  'MISTGREY/BLACK ' 'DEEP WATER BLUE' 'L/BLUE STRIPE' 'LIME GREEN     '
##  'GREY/MAROON' 'Grey Check     ' 'DUST BLUE' 'NAVY CHECK     '
##  'LILAC STRIPE ' 'GREY/NAVY' 'INDIGO RED' 'INDIGO BLUE' 'Dot Navy'
##  'S/BLUE         ' 'PLACID BLUE' 'LILAC CHECK ' 'Dot Blue'
##  'D/NAVY         ' 'INDIGO YELLOW' 'SNOW WHITE     ' 'Dot Red' 'Geo Blue'
##  'Dot Green' 'T/Yellow' 'M/GREY         ' 'WALNUT         '
##  'STRAIGHT BLUE' 'SILVER PINK' 'GREEN/WHITE    ' 'AQUAMARIN      '
##  'ALASKAN BLUE   ' 'Evening Haze' 'B/RED          ' 'NAVY/BROWN     '
##  'Vintage Violet' 'Ice Blue' 'SMOKE GREEN' 'Blue Bonnet' 'C/RED          '
##  'N-CHOCLATE       ' 'NAVY/L.BLUE    ' 'Dot Grey' 'Self Blue' 'V-YELLOW'
##  'WHITE/LILAC' 'SKY WHITE      ' 'CHATEAU GREY' 'TEAL           '
##  'PEARL WHITE' 'CORAL          ' 'Black/GOLDEN' 'BRIDAL ROSE'
##  'HARBOUR BLUE' 'METAL          ' 'PEACH BLUSH    ' 'VANILLA ICE    '
##  'Cerulean White' 'DARK BLUE      ' 'PURPLE/LILAC' 'RED OXIDE'
##  'Mid Night Blue ' 'FOAM GREEN' 'DRESS BLUE     ' 'BLUE/WHITE2'
##  'BLUE/PAISLEY' 'Orchid' 'BLACK/PAISLEY' 'BLACK/FLORAL' 'Lavender'
##  'WHITE/FLORAL' 'NIRVANA' 'D/BROWN        ' 'Black Strip    '
##  'BROWN/PURPLE          ' 'TURKISH COFFEE ' 'Cherry' 'WHITE9'
##  'WHITE3         ' 'WHITE6' 'WHITE7' 'BLUE/KHAKI' 'WHITE8' 'WHITE5'
##  'WHITE2         ' 'WHITE4         ' 'BLUE/WHITE 1   ' 'NAVY/BLUE'
##  'DENIM BLUE' 'MINERAL BLUE' 'FUCHSIA ' 'EGYPHAN BLUE' 'Electric Blue'
##  'CAROLINA BLUE' 'Sky blue/White' 'CORN FLOWER BLUE' 'BLUE/WHITE3' 'BLUE1'
##  'WHITE/L.BLUE' 'BLACK/WHITE DOBBY' 'Aqua sky' 'NAVY/PURPLE    '
##  'Rust/White' 'Ocher/White' 'ORANGE/WHITE   ' 'RED DOBBY' 'GREY/ROYAL'
##  'Ocher/Grey' 'SKY/RED' 'BLACK DOBBY    ' 'ROYAL/SKY' 'PLUM' 'Grey/Brown'
##  'FROST          ' 'GLACIER GREY   ' 'WHITE12' 'NAVY/LILAC' 'WHITE11'
##  'L/BLUE DOBBY' 'Grey/Purple' 'Sky Dobby' 'DUSTY PINK' 'Navy/Sky Blue'
##  'Blue Berry' 'D/GREY STRIPE' 'BLACK/FEROZI' 'BROWN STRIP    '
##  'PURPLE CHECK' 'Pink check' 'ORANGE CHECK' 'WHITE10' 'D.BLUE/WHITE'
##  'GREEN/NAVY     ' 'ORANGE/BLUE' 'MINT           ' 'O.WHITE/NAVY   '
##  'M/DARK GREY    ' 'WHITE/SKY' 'NAVY/SILVER    ' 'GREEN/LILAC'
##  'WHITE DOTTED   ' 'Black dot' 'Red CIRCLE' 'Blue Turqoise' 'ROYAL WHITE'
##  'WHITE/L.BROWN' 'BLUE/OFFWHITE' 'BLACK FLORAL' 'ORANGE/NAVY' 'D/CYAN'
##  'ROYAL BLUE/GREEN' 'GREEN/BLACK    ' 'D/PINK' 'D/FAWN         ' 'L/NAVY'
##  'C/GREY         ' 'PINION' 'Blue Dobby' 'NAVY/OCHR' 'SWIASS DOT'
##  'White/Turquoise' 'WHITE/PEACH' 'NAVY/OFFWHITE           ' 'SKY/BROWN'
##  'WHITE/L.PURPLE          ' 'D.BROWN/WHITE' 'Grey Strip     '
##  'Black/Maroon' 'Blue Stripes   ' 'NAVY/PINK' 'Blue dot' 'GREY DOBBY'
##  'PEACH STRIPE' 'GREEN/PURPLE' 'Maroon/White   ' 'WHITE/FEROZI   '
##  'BEIGE/BROWN    ' 'AQUA/YELLOW' 'WHITE PINK     ' 'Crystal Pink   '
##  'RED/BROWN      ' 'S/L BLUE       ' 'Cayenne' 'Strom Grey     '
##  'Maroon/Grey' 'BROWN/NAVY     ' 'SAGE           ' 'FAWN/WHITE'
##  'WHITE DOBBY']
## =====================================
## Number of unique data for Sizes is 26
## unique data for Sizes is ['LAR  ' 16 '14½  ' '15½  ' '16½  ' 17 'MED  ' 'X-LAR' '17½  ' 15 'MEDIM'
##  'SML  ' 'LARGE' 'SMALL' 'XLARG' 'XX-LR' 'X-SML' 'XXLRG' '16½' 'MIX' '15¾'
##  '15½' '17½' '14½' 'MIX  ' '18½  ']
## =====================================
## Number of unique data for DiscountType is 10
## unique data for DiscountType is [nan 'No Discount' 'Director Relatives' 'Group Discount'
##  'Employee Discount' 'Special Promotions' 'Bundle' 'Promotion' 'EPP'
##  'A Suit for Every Occasion']
## =====================================
## Number of unique data for SalesmanName is 796
## unique data for SalesmanName is ['840 WASEER' '836 AMEER ZAIB' '2582 AWAIS' '2558 BABAR' '2581 WAQAR '
##  '837 M. HASNAIN ALI' '839 AMJAD ALI' '0086 M HABIB' '2187 SHAFIQ SARWAR'
##  '0602 IRFAN AZIZ' '831 M SHOAIB' '2460 BILAL' '751 SAAD ARIF'
##  '2339 ARSHAN' '1106 FAIZ AHMED KHAN' '1050 ZUBAIR HUSSAIN' '2677 UZAIR'
##  '2625 ALI ' '2604 AMJAD KHAN' '1214 ZEESHAN AHMED' '1945 M HASEEB'
##  '826 ZEESHAN' '2540 MALIK ' '824 IMTIAZ AHMED' '822 SHEZAD KHAN'
##  '825 RAEES AKBER' '0017 M MAQSOD' '2662 MUSAWIR SHARIF'
##  '852 SYED WAQAR HUSSAIN SHAH' '1170 SADDAM HUSSAIN' '851 KAMRAN HUSSAIN'
##  '859 ADNAN MUKHTAR' '1949 AZHAR MEHMOOD' '861 M USMAN' '827 SAMIR'
##  '1929 RASHID KHAN' '1606 Rauf Khan' '1944 KASHIF QURESHI' '866 KASHIF'
##  'NAVEED JUMMA' 'NASIR SOOMRO' 'FAROOQ YAQOOB' '709 Ali' '745 M UMAIR'
##  '749 SADIQ ' '1939 Irfan Khan' '2341 LUBNA' '1593 SYED AHSAN ALI'
##  '2395 SAQIB' '743 M AMIR S' '744 WAQAR KHAN' '752 IRFAN' '750 UMAID '
##  '2619 Sohail' '748 CYNTHIA WILSON' '784 SARAH' '714 JAMAL'
##  '697 SHOAIB KHAN' '700 KHALID' '2002 M FARHAN ALI' '2523 FAHAD '
##  '2285 M ILYAS ' '2693 KHALIL ' '696 ABDUL RAZZAQ' '2340 HARIS '
##  '2168 DANISH' '2763 ADNAN' '2104 NABEEL AHEMD' '711 FAIZAN'
##  '1686 AHMED MIRZA' '1689 DANIAL AKHTAR' '1037 NAVEED KAUSAR'
##  '792 WASIF KARIM ' '2347 KHUSHBOO' '2113 M WAQAS' '708 S.M.AKBAR SHAFI'
##  '2337 Saniya' '1086 AHSAN ALI' '1085 DANIYAL ' '2206 M SAQIB'
##  '712 QURYAT' '1057 WAQAS AHMED' '798 SEHRISH YAQOOB'
##  '2702 JANNAT IMTIAZ ' '1036 ZUBAIR SHAHID' '1081 ASAD' '965 QAISER '
##  '900 NAVEED ASGHAR' '2350 AMIR' '0068 NOMAN ZAHOOR' '1245 ABDUL SAMAD'
##  '0862 M SALMAN' '2534 DANISH' '2647 M SALMAN' '923 SALMAN YOUNAS'
##  '738 SHEERAZ AHMED' '1868 ELVIS GEORGE' '740 SHAHID BAIG'
##  '732 AURANGZAIB' '814 SHAN ALI' '0067 RANA RAHEEL HAQ' '966 M ANSAR SHAH'
##  '905 SHEHROZ ' '968 MAFFIA' '1162 M.ATEEB GHANI' '950 KANWAL '
##  '2055 VICKY MAHSI' '927 MAQSOOD AHMED' '0083 M FAIZAN'
##  '1000 M.SALAHUDDIN' '999 SHEBAZ RIAZ' '926 M USMAN TANVEER'
##  '1678 Abdul Rehman   ' '929 SALEEM' '702 SHAHZAIB ' '2401 AREEB'
##  '2483 UFAQ' '2409 SALMAN' '2240 MAQSOOD ' '765 M.NAVEED SIDDIQ'
##  '791 M HARIS' '2241 BASIT' '2621 AMMAR' '805 RABIA KHAN' '2768 TAHIR'
##  '0659 AZYAAT NOOR' '951 ASIF' '2461 NOMAN ' '2633 A BASIT' '2589 MUZAMIL'
##  '952 SAJID RAZA' '935 M ADNAN' '2272 SUMAIR' '993 FAISAL BHATTI'
##  '908 JUNAID JAMEEL' '2446 ZOHAIB' '2391 ANIQ ' '2660 SHEHZAD AYUB'
##  '0220 M SHEHBAZ BARKA' '975 TABISH SALEEM R' '2747 SABIR ' '984 IMRAN'
##  '2598 SHAFIQ' '2602 AHSAN' '985 OMAN' '2601 TAHIRA' '2731 ROA SAJAWAL'
##  '2599 SHAMEEM' '986 NIDA JAMEEL' '846 WAQAS HUSSAIN' '847 NOMAN GILLANI'
##  '848 ZEESHAN' '2058 S AQEEL ' '0062 JAVED IQBAL' '2634 USMAN '
##  '0628 MUZAFFAR HUSSAI' '2508 S.RIAZ' '2535 RAZA' '2506 RASHID '
##  '976 M IRFAN' '2511 BAKHTAWAR' '962 M IRFAN' '961 FARAN NAZAR '
##  '964 SOHAIL ZAFAR' '2398 SHAHRUKH' '1881 ABDUL RAHEEM HANIF'
##  '2397 BENISH' '730 JAFFAR AKBER' '2144 KHURRAM ' '1588 AMBREEN'
##  '2592 RIDA' '775 MANZOOR AHMED' '753 ARSALAN' '1502 M.TAIMOOR'
##  '1993 M RIZWAN ' '1103 HASSAN' '1123 HASAN' '2630 IMRAN' '2402 SHAHYER'
##  '2280 FAHAD RAFEEQ' '1556 AMMAR SALEEM' '2610 ZOYA' '2768 ANAS '
##  '2754 WAQAS' '1585 MASOOR UL HAQ' '1148 BABER IRFAN' '2433 AQEEL'
##  '914 SHARJEEL SAEED' '994 JAMSHAID  AHMED' '918 ASAD' '1012 BILAL'
##  '1011 UMER BUTT' '0010 GHULAM ALI AKBA' '2273 SHAMIM' '0026 M NADEE'
##  '2388 Nayla ' '2425 RABIA' '936 Muzahir ' '1664 ASAD YOUNUS' '2435 ROZI'
##  '2322 Adnan Maqbool' '0944 IMRAN' '1103 KASHIF UR REHM' '2520 Shahrukh'
##  '2759 QAVI' '1764 M NASIM BAIG' '875 ASHFAQ HUSSAIN' '2549 M ABDUL'
##  '2640 NOMAN ' '2654 WASEEM BOOTA' '860 M WASEEM' '2348 MAZHEAR'
##  '2670 BABAR SHEHZAD' '2668 SAEED' '2648 ARSALAN FIDA'
##  '865 SYED AHMED SHAH' '2671 AMMAR' '864 SADDIQUE AKBAR'
##  '863 SHAHZADA A R' '862 HASAN RAZA' '2667 WASEEM ' '1946 FARIS JAN'
##  '902 WASIM FAROOQI' '2448 BILAL KHALID ' '1015 AZAM PERWAIZ'
##  '2590 FAISAL' '0266 JUNAID AHMED' '1596 FAHAD MUGHAL' '880 QASIM'
##  '882 SAJID' '2656 SYED IRFAN ALI' '164 IMRAN ALI' '2386 M Shehzad'
##  '0723 SYED ALI HASSAN' '2536 BILAL ZIA' '2642 AWAIS ABBAS' '2389 RASHIDA'
##  '1018 MARIA ' '2744 KHALID ' '2773 WAQAS' '2829 MOIZ' '2835 GHAZANFER '
##  '1090 MUDASSIR ' '1099 FAIZAN' '2890 TALHA' '2271 SADDAM RASHEED'
##  '854 MARIA' 'JAVED QAMAR' '1091 SUMBAL' '2846 MUZAFFAR' '2810 HAMZA'
##  '1096 ABID' '754 MEHREEN' '1097 AHSAN' '698 FAIZAN' '1092 ASAD'
##  '2816 S.WAHAB ' '2833 SHAFI UDDIN' '724 ERUM' '2830 FURQAN '
##  '2847 OWAISE' '2719 AMIR RAIZ' '953 USMAN' '734 SALMAN' '1053 WAQAS'
##  '733 WAQAS SALAMAT' '969 USMAN' '945 KANWAL ' '928 M. Laique'
##  '2709 Jwahir' '2826 KAZIM' '2767 FARHEEN' '2771 KINZA' '2808 FARHAN'
##  '1112 ARIF ' '2827 AJMAL' '1082 ADEEL ' '2888 MOHSIN' '2886 SUFIYAN'
##  '1084 IMRAN' '2836 M.FAIZAN' '729 AWAIS' '741 MUDDASIR' '755 BILAL'
##  '1088 NOMAN' '2794 IMTIAZ' '939 A BASIT ' '940 WAQAS' '938 WAQAS'
##  '2737 SABIR ' '2742 RIZWAN' '0223 SYED AHMED ALI' '2812 ANIQA'
##  '2774 SONIA' '2733 HAMZA' '762 IMRAN' '761 SAQIB' '2815 ZOHAIB'
##  '0415 SELEEM AHMED' '0916 WAQAS' 1426 '2879 HARIS ZAREEN' '2932 IJAZ'
##  '2635 SHAFIQ' '2937 ASAD' '2915 NIGHET' '2943 ASIF' '2896 NOMAN '
##  '719 KAMRAN' '2852 HAMZA' '715 Ali Zia' '2854 ZEESHAN' '2428 HAMZA'
##  '954 SAEED' '2929 SHAH ZAIB' '2935 TAHIR' '1110  HASSAM'
##  '1043 SALMAN AFZAL KH' '930 AZEEM ' '2953 Faizan uddin' '2861 ALINA'
##  '2951 Ahtesham Khan' '2900 MUSARRAT' '2819 MEHWISH' '2916 ASGHER'
##  '977 ARSALAN' '978 AFIFA ' '963 MEHTAB GUL' '2899 ADEEL'
##  '2898 SADIA KHAN' '2979 Farhan' '2954 Anzar' 'hamza' '777 M.Akhtar'
##  '915 KAMRAN ' '2820 NABEEL YOUSAF' '2876 JUNAID BUTT' '1002 A QADEER'
##  '2960 ZIA' '1001 MOEED' '2962 FAIZAN' '1014 ALI' '3005 SHABIR ASLAM'
##  '2914 SHABAN' '876 MUSTANSAR AYYAZ' '2973 TOUQEER' '2976 ZAIN'
##  '3017 S TAIMOOR SHAH' '867 SANOBIR MASIH' '887 ASAD' '903 JALEEL'
##  '2931 SADIA NAZIR' '3008 M IMRAN' '3009 RANA M AZAZ'
##  '3016 TEHZEEM UL HAQ' '833 SAQIB ISHFAQ' '2999 AHSAN ABBASI'
##  '3057 Faizan Anwer' '716 M.MAIRAJ' '3071 MIRZA TAIMOOR'
##  '3024 ZESHAN JAVEED' '3030 NASIR ABBAS' '828 M.Waqar'
##  '3135 RAHEEL ISMAIL' 'Saqib Al' '3102 M. Shoaib' '3053 Hira Naz'
##  '3070 BABAR ' '713 MEHREEN' '3105 Ureedullah' '3054 Duke Alexander'
##  '1007 SALMAN AFZAL ' '931 M.Azeem' '3160 Mohsin Raza' '3049 MUSHFIQ'
##  '796 WAQAS' '795 RASHAD' '3058 Saima Malik' '3064 A GHANI' '797 NIDA'
##  'Faiz ul haq' '3148 Humza Asif' '3149 R.Zulfiqar ' '3122 Babar'
##  '3018 RAJA DANISH' '811 QAMAR' '3073 MOIN ' '767 ZAHEER' '769 SAAD IQBAL'
##  '768 M.Haroon ' '3045 RIZWAN' '3097 Umar Masood' '2877 MATEEN '
##  '1003 Noman Ashiq' '1004 Farhan Liaquat ' '942 Salman'
##  '877  Mubarak Shah' '878 S.Farhan' '2994 AMEER HAMZA'
##  '3023 MUTQEMUN.NISA' '3021 IZHAR SHARIF' '3028 SUMMON GULL'
##  '3027 FIRYAL JORGH' '3094 MUBASHAR ' '889 ZULFIQAR ALI' '1052 AMIR'
##  '2997 MALIHA KHAN' '3026 SAMIA SHAKEEL' '13239 Subhan' '13275 AA'
##  '13227 M.Rashid' '857 Naqash Masih' '856 M.Waqas' '855 Zulqarnain'
##  '788 Noman M.Ashraf' '1111 M.ASIM ' '758 M JAWED' '756 ASAD' '699 M MUAZ'
##  '13240 Usama Ahmed' '3204 A.Wahab' '813 M.Saqib'
##  '13293 Muhammad Rizwan Khan' '901 Rafaqat' '920 zaigham Altaf '
##  '921 USMAN BUTT' '919 M.Usman' '971 Bilal Ahmed' '970 Saira Mustaqeem'
##  '3216 UMER HAYAT' '1113 Hamza Khan' '815 HAMZA ' '799 SYEDA KANZA'
##  '907 Rustam Khan' '13220 S.Ahmed Hussain' '989 S.Asim Bukhari'
##  '987 Aroosa' '3141 ISMAIL KHAN' '3182 Arsalan Aslam' '3140 M.Rizwan'
##  '3206 Iqbal' '770 Huma Naz' '778 Zain ul Abdeen' '13238 M.Mohsin'
##  '808  IMRAN ' 'M.Umair Aleem' '1105 A.Shahid' '1061 S.Talha'
##  '739 M.Furqan' 'New ' '774 Shaikh' '916  Hassan Nadeem'
##  '917 FAIZ UL HASAN' '3196 S.Adnan Al' '943  Mehmoona Shoaib'
##  '941 HAFIZA SHUMAILA' '1008 JENNY' '13219 M.Sagheer' '922 BEHROZ ASLAM'
##  '868 KAFEEL ' '13228 Uzma Noreen' '3190 Khurram' '3249 JALAL'
##  '3252 SAQIB NAZIR' '3253 SUZANA' '13217 Abdullah Afzal'
##  '3197 Nabeel Saleem' '2500 J ADEEL' '956 Sumaira Hassan'
##  '955 Ahsan Ayyaz' 'M.Shahzad' '3285 shamoon masih' '1054 SHEIKH NADEEM'
##  '823 Zeeshan Ahmed' '816 ZOHAIB ' '757 TAJDAR KHAN' '718 ABDUL WAKEEL'
##  '717 Rashid Manzoor' '710 SYED NOMAN ALI' '995 Qasim Rehman'
##  '3359Fahad Munir' '933 Tallat Mahmood' '0006 NAIM AHMED'
##  '794 M.SALEEM BASATHIA' '3280 Arsalan Aslam' '980 Muhammad Nafeel'
##  '802 NAVEED AHMED' '803 WAHEED KHAN' '806 RABIYA' '817 MUHAMMAD FAISAL'
##  '818 Muhammad Obaid' '819  WAQAS HANIF' '3366 NAEEM ABAS ZAIDI'
##  '3335 OMAIR' '812  HASSAN' '771 SHEERAZ UR REHMAN' '1060 IMRAN MUSTAQEEM'
##  '780 IMRAN JAMI ' '779 OWAIS' '3297 ZAHID HUSSAIN' '735 MUHAMMAD OSAMA'
##  '1005 NOMAN KHAN ' '946 SHOAIB' '3312 RANA AHMED AKASH' '731 ASIF'
##  '3315 SOBIA PERVAIZ' '871 FOUZIA AZIZ' '1062 Noman Ishaque'
##  '872 Ghayoor Hussain' '869 SONIA MAROOF' '881 RABIA'
##  '894 Saqib Abbas Rathore' '898 Shahzeb Shaukat' '895 Naeem Ahmed'
##  '897 Syed Qasim Raza Shah' '3278 Khalid Hussain' '899 Amjad Butt'
##  '896 Walait Hussain' '1153 FAISAL ALI' '1156 M.ZEESHAN'
##  '01231 Muhammad Junaid' '1017 Rashid Arshad' '904 Majid Hussain'
##  '3048 SABINA ' '807 ali raza' '849 Ammad Maroof' '756 Muhammad Asad'
##  '1136 IRZAM' '1222 M HAROON' '3400 TAHA AHMED' '1016 Hafiz abdul ghafoor'
##  '1024 HASHIM HUSSAIN ' '1022 ZAINAB ' '1021 KIRAN ' '1125 SAEED '
##  '997 MUHAMMAD ZOHAIB' '1134 uMAIR' '1165 IMRAN ALI' '1163 RAJA MUZAMMIL'
##  '1164 ALI AZAAN' '1194 M FAHAD KHAN' '3362Rabia' '1185 aBDUL'
##  '832 IRFAN AZIZ' '1191 Shahid' '1258 KASHIF MEHMOOD' '1182 Zeeshan'
##  '1234 Samreen Bux' '1302 Faisal Azeem' '1276 Saqib Ul Hassan'
##  '1456 Rizwan Rehman' '1250 Arsalan Shah' '1118 Makhdoom' '1089 adnan'
##  '1306 Fahad Azeem' '1249 Ali Naseer' '1471 Ashir Rasheed'
##  '1301 Ahmed Alam' '947 QASIM' '1296 SUHAIM IRFAN ' '1143 FIDA HUSSAIN'
##  '1194 Kainat' '1193 Sehrish' '1407 Ghulam Ali' '1445 Nazim Khan'
##  '1275 Muhammad Hammad Siddiqui' '800 M Waqas' '1124 ZUBAIR'
##  '1457 Adnan Saleem' '704 GULSHAN KHAN' '1256 ALI HASSAN'
##  '1270 AQEEL RASHEED' '1187 Zeeshan' '1154  M.Ali' '983 Salman Qureshi'
##  '990 Asma Azam' '1152 M QASIM' '981 MUHAMMAD' '979 RAHEEL'
##  '1111 MUSTAFA JAVED' '1289 Syed Hasnain Alam' '1273 Hamza Asad'
##  '1292 Farzana Shoukat' '1020 ROCK' '1283 ASAD YOUNUS' '1298 Saima Naz'
##  '01230 Tabbassum Nazeer Bhatti' '1291 Muhammad Aamir '
##  '1434 Fahad Faheem' '1279 MUHAMMAD SHAFFAY' '1406 Zubair Ahmed'
##  '911 FIDA HUSSAIN' '1244 Tayiba Khalid' '1259 NABEEL YOUSUF'
##  '917 M.Faiz ul Hassan' '1405  M' 'SC02 STAFF' 'SC04 STAFF' 'SC01 STAFF'
##  'SC05 STAFF' 'SC06 STAFF' '1162 Muhammad N.' '1260 SOBIA PERWAIZ'
##  '1262AHMED RAZA' '1087 A.WAHAB' '0009 FARIS' '883 FARHAN'
##  '1126 IMRAN YASEEN' '1265 HAYAT KHAN' '1269 HARIS NAVEED'
##  '1278 SHUMAILA RAEES' '1288 SHEZA ASHRAF' '1632 ABDUL REHMAN'
##  '1937 Faisal Aman' '1180 Daud' '01476 Bilal Zulfiqar'
##  '1487 Jafar Raza Jafri' 'Zulfiqar ali' '1480 Nafasat Ali' '1976 kashif'
##  'Salman Arif' '1243 Muhammad Osama Khan' '1609 Syed Danish' 'Hira Rasool'
##  '1274 Muhammad Huzaifa' '01153 Uzair Rafiq' '1239 Usman Sarwar'
##  '01503 Fahad Khan' '1618 Khurram' '01623 Kamran Shahzad'
##  '01624 Kamran Nadeem' '1479 Muhammad Fayyaz Ghazi' '01938 Azeem Abbas'
##  '1969 Bilal Umer' '1173 Ali' '1906 Masroor Ahmed' '3317 M.AMJAD'
##  'Miss Amara' 'Imtiaz Mumtaz' '1789 Faizan farasat ' '1980 ALI '
##  'Fahad Ali ' 'Ghazanfar Ali' '2200 ABDUL REHMAN' '1410 Sobia Khan'
##  '01930 Zohaib Ali' '2799 MUZAMMIL' '1436 Muhammad Wajahat '
##  '1416 HAMMAD ' '1271 HUMAIRA RAMZAN' '1464 Hammad Qaisar'
##  'Muhammad Shahzad' 'Umar farooq' '1483 Ahsan Aslam' '1411 Haseeb Khan '
##  'Syed Ateeb Hussain' 'Haris Khan' 'Hamza Khan' '1431 Abdul Wahab'
##  '1433 Mohammad Tayyab ' '1418 Saba Khan' '1437 Noreen Aksha'
##  '01512 Asadullah' '1522 Danish Khan' 'AHmed Mujahid' '01606 Waqas Anwer'
##  '01474 Syed Muzaffar Hussain' 'Ishfaq Khalil' 'Mohammad Awais'
##  '1873 Zubair Amjad' '01862 Usman Khan' 'M Qaiser Irshad'
##  '0469 HAFIZ ADIL ' '1486 Syed Faraz Hussain' '1106 IMTIAZ MUMTAZ'
##  'Yasir Gulistan' '1885 Nargis Hameed' '1281 Urooba Mehmood'
##  '01506 Dur E Shahwar Zehra' 'Roshan Aftab' '1408 Muzaffar Ali Shah'
##  '1631 USMAN ALI KHAN' 'Syed Afroz Shah' 'Aqsa Fatima'
##  '01934 Qasim Saleem' 'Adnan Shah' '01444 Zeeshan Anwer' 'Haider Ali'
##  'Hassan Abbas' '1492 ABDUL WAHAB' '1818 M.IMRAN' '1010 Anjum Farooq'
##  '1495 HUMAIRA ILYAS' '01993 Mohammad Salman' '1794 NADEEM RAIQUE '
##  '1998 Noshaba Amir' '1490 Sohail Pervez' 'SC03 STAFF'
##  '1488 Rabia Shaheen ' '01475 Palwasha Farooq' '1442 Azhar Jahangir '
##  '1489 Saba Arif' '1710 ILYAS ' '1626 HASSNAIN '
##  '2020 NAHEEM AHMED QURESHI' '1484 Faiz Muhammad' '01625 Muhammad Asad'
##  'MUHAMMAD AMIR' '1485 Amjad Hussain' '1413 Syed Azam Abbas' '1135 aSIM'
##  '1518 Zeeshan Ullah' '890 Usman Ali' 'MUHAMMAD ASIF BUTT'
##  'AHSAN HAMMAD                  ' '1494 MISS ANSA'
##  '01956 Muhammad Hasnain' '1262 AHMED RAZA' 2201 '01948 Toqeer Ahmed'
##  'Abdul wahab' '2202 MALIK ISRAR' 'Afaq Asad' 'Abdul Rehman'
##  'Umair Shabeer' 'Azeem ' '1234 Samreen Buksh' 'Samreen Buksh'
##  '2227 Danish' 'Robinson' 'Haseeb Tahir' '1876 Shoukat Ahmed Khan'
##  'Bilal..' 'Fazal.' '1083 MUZAMMIL' '1108 S.GHUFRAN AHMED'
##  '01835 Rameez Aslam' '1107 S.Aijaz ' '01872 Fahad Baig' 'Waseem Ul haq'
##  '1150 FIDA' 'Amna' 'MUdassir Hussain' 'Umer hayat' '2091 Aurangzaib'
##  '1059 Sana Saeed' 'Kaneez Yasmeen' '3025 M TAYYAB' '1104 BILAL AHMED'
##  '1102 FAIZAN SIDDIQUI' '1115 SALEEM' ' Wajid al Hussaini' 'SHAKIR'
##  'Waqar Hassain' '2318 HAMZA' '01931 Mohsin Ali' '1107 TARIQ HUSSAIN'
##  'Zeeshan' 'Daniyal.' 'Bilal' '1788 MANAN' '2196 WAQAR ' '2197 TAJAMMAL '
##  '2278 DILAWAR' 'Mohsin Butt' '2294 IMRAN ' '01960 Sheikh Mohsin'
##  '1935 Amir Masih' 2377 '2085 Sana Ullah Butt' '01870 Muhammad Shahbaz'
##  '2190 SAJAWAL' 'Daniyal' 'Nebeel' 'Sumaira bibi' 'Asad Anwar'
##  '1455 Miuhammad Imran' '1100 WAQAR KHAN ' 'AHSAN Ali' '88 Muhammad Waqas'
##  '1932 Tassawar Abbas' '1516 AHSAN HAMMAD                  '
##  'MUHAMMAD MOAZZAM' '2279 M BILLAL ' 2320 '1501 Syed Abdul Basit'
##  'Waqas Zarmoin' 'Adnan Haider' 'Usama Haider' 'Usama Iqbal' 'Sohail'
##  'Fazal qadeer' 'Fahad Shoukat' '2159 Shahbaz ' 'UreedUllah' 'Awais Raza']
## =====================================
## Number of unique data for Qty is 60
## unique data for Qty is [  1  -1   2   3   4  -2   6   0   5  27   8  17  12  14  24   7  19  -3
##   -5  -4 -19 -14 -26 -12  -7  11  23  22  21 -21 -27 -28 -30 -15 -17  -9
##   -6  18   9  10  20  26  16  15  13  -8 -23 -24 -18 -10 -16 -13 -11 -20
##   29  28  45  44  43  32]
## =====================================
## Number of unique data for SalesReturnReason is 7
## unique data for SalesReturnReason is [nan 'DISLIKE ' 'COLOR' 'SIZE' 'FIT' 'PATTERN ISSUE' 'DAMAGED / DEFECTED ']
## =====================================
## Number of unique data for Price is 287
## unique data for Price is [1710 1614 1805 2376 3138 3424 1900 3329 2389 2471 1424 1280 4269 1519
##  1995 2090 2186 4757 3043 5710 1962 4281 2470 3423 3614 1662 1186  948
##  2852  900 2567 2138 1233 1090 1569  805 1567  995 2661  852  757 1043
##  1283  833 1019  614  586 3233 1329 2043 1852 2133 3710 1480 3559 2756
##  1804 1975 2668 1064 3615  902 2371 1234 1566 1134 2757  985 2173  866
##  2281 3519 3805 1535  886 1138 2262 1081 1129  712 2033 1067 1633  662
##   905 1281 1712  710 2329 1521 1141  807 1046 1045  947 1757  950  855
##  1331  854 1183 1426 1282  974  617 1321 1495  879  781 1664 2076  759
##   939 1091  903  842  740 1236 1188 1086 2074 2140  666 1137 1147 1807
##  1616  988 1093 1522 2662 4712 2448 2354 3486 1788 1694 2071 1882 4241
##  3297 2165 3769 3392 1976 1646 3108 2825 1599 1175 1080  844 1741 1835
##  2542 2259 2024 3014 1033  892  975 1505 1410 1552 5657 3675 3203 2731
##  3733 2052 1787 3580 1316 2349 3581 1222 1929 3109 6127 1074 2166 2637
##   797  986 1601  750 3391 1223 2447 1268 1864 1678 2425 5201 3453 3078
##  2798 2144 1630 3266 2636 4582 5958 3023 3390 4995 2381 2289 2014 1830
##  1739 2105 2197 1922 3299 4858 3206 3022 4124 3665 1647  583 2106 1050
##  1371 1143  567 1271 1507  706 1036 1784  867 1464 1005 3115 2307 1035
##  1127 2564  894 1509  965 1738 4052 2931 3574 2747 3298 1968 1555  681
##   942 1958 1771 1564 1235 2251    0  799 1957 2472 1130  896  978  893
##  2323 2421 2226 3488 2214 1935 2032 3197 3585 4845 1858 2130 1990 3024
##  4583 5133 5959 1648 3483 3296 5278]
## =====================================
## Number of unique data for Amount is 836
## unique data for Amount is [  1710  -1614  -1805   2376   3138   1614   3424  -1900   1900   1805
##    3329   2389   2471   1424   1280  -2376  -4269  -1519   1995   2090
##    2186   1519   4269   4757  -4757   3043  -3329  -2090   5710  -1995
##    4752  -3138  -3424   7128  -2186  -1424   3990   7980   1962  -2471
##    4281  -2470   8538  -1962   3610  -1710  -3423  -2389  -3614   3423
##    1662   9504   1186    948   2852   2372    900   2567   2138   1233
##    1800   1090  -1186  -2372   -900   4744   3558   1569   3614    805
##    1567    995   2661    852    757   2180   1896   6276   2470  -1567
##    -948   6848   1043   -805   1283  -2852   7116    833   1019    614
##     586   3233  -3233   4942      0  -1662  -1233   1329   5930  32022
##    3228   2043   1852  14256  19008  40392  28512  33264  11880  57024
##   16632  45144  -4752  -7128  -2043  -1852 -11880  -9504   2133 -45144
##  -33264 -61776 -28512 -16632  -2567  -2661   -995  26136  54648  52272
##   49896   4372 -49896 -64152 -66528 -71280 -35640 -40392 -21384 -14256
##    3710   5322   6086  -3610   1480   3559  -2756   1804  -3710  -5415
##    4180   5415  -3043  -4281   7413  -5710   6658  -6276  10272   2756
##   -1975  -1804   2668   3800   3608   1064   3420  19855  32490   7220
##   16245  -6848  10830  18050  39710  43320  36100  14440  46930  28880
##    9987   3615   6466    902   1975   2371   1234  -3038  21660   4778
##    5700   1566   1134  -2371   2757    985  -2173    866   -985  17076
##   12807  -3800   2281   3519  -3519  -2133   3805  -1535   -886   3924
##   -2281   1138   2262   1081   1129  -1043    712   2033   2086   1067
##    1633    662  -3805   7038  38016  21384  23760  35640  30888  -5322
##   -6086  -7220  42768 -19008 -54648 -57024 -42768 -23760   9025   4266
##   -8538   4562   1535    886  -6658   -833    905   1281   -852  10557
##    3414   1712   2466    710 -38016 -30888 -26136 -47520   2329  -1090
##    7610  -3420   1521   3324  17595  -7038   8562   5985  -5985  13680
##    6840  -1329  47520  64152   1141    807   2092   1045    947   1757
##     950  -1138    855   1331    854  -1183   1183   5704   5130  30780
##    8550  -3990   5134  -4180  13316   6558   8302   9488  14232  23720
##   20162  11860   3270   1704   3408  21348  18976   2556   1426  15200
##   15418   2706   1282   1948   4360   1610   4525    617   3514   5271
##    2662   2642   4563  44109   6268   1495    879   8310    781   1664
##    4510   3416   4740   3129   9387   2076    759   8050  10224   5964
##    8855  12075   4025   6258   4172  10428   1878  10674   5450   6270
##    4750   7600   1091   7637   3273   4932   7630   3042    903    842
##    2850    740   3792   1236   5705   1188    939  10900   6540   1046
##    3135   2566   2844   2282  11990  10450   1894   2182   3700   1086
##    2074    974   2140    666   6084   1137   2274   1147   7698   1807
##    1616   4701   8720   4260   4500    988   3328   1093   1522  11473
##    2415   7301  15960  11970  19028  23785  14271   9514   8744  15302
##   10930  12540   8360   9500  14630   1990   1810   9126   -710 -12807
##   -1521  26632  79896  -4372  50540  -3615  -2662 106920 104544  -2848
##   -6466  -2757  -7610  11415   4712   2448   2354   3486   1788   1694
##   -1694   2071   1882   4241   3297  -3297   2165   3769   6594   3392
##    1976   1646  -3769  -1976  -2354   4708   3952   3108  -2448  -4708
##   -2071  -3486  -1882   9891   6216   7062   3764  -6594  -2165   4330
##   -1788   2825   7538  -3108  -1646  -1599  -3392   3576   1175  14040
##    3240   4142   1080    844   2160   1741   1835  -1175   1599   2542
##    6972   6213   2259   5364   2024   3014   1033   2350    892    975
##    1505  -4241   8482  -1410  -1552   5657   3675   1410   4896  -5657
##    7152   5646  -1505   1552  10728   7344  -4330   6495   3203  -4712
##    2731  -2542   3733  -3733  -3576  -2024  -7062   2052  -2052   4938
##    1787   3580  -6972   6784   1316  -5364   3388   2349  -3581  -2349
##   -3014  16485  11307  11770  -1222   1222  -1929   3292   9416  -2259
##    3109   6127   5650  -2825  -1074  -5650   9424   3574   2166   2637
##   12426    797  10355   5928   9880   1784  10176    986   1601    750
##    8284  -2166  -6127  -3764  11314   3391  -4142  18820  11292   1223
##    4518   7528  -1787  -5646  -8482   2447  -3109  13174   9410  -3203
##   35758  41404  -1268   1268  -3580   1864   1678   2425   5201  -2425
##    3453   3078  -1678   2798   2144  -1864   1630  -3453   3266  13188
##    3581  10458   2636  -3391  20710  33136  91124  45562  16568   4582
##    5958   3023  -7528 -11292   3390  -4995   2381   2289  21186  -3390
##   -2014  -2289 -21186  16023   4995   1830   2014   1739   2105   2197
##    1922  -1830   3299  -2105   4858  -1922  -2197  -3206  -4582   3022
##    3206  -3299   4124  -4858  -1739   4578   3665   5490   1647  -1188
##   -3023  -4124   3660   6867  -4578  -5958  -1647    583   2106   1050
##    1371   1143    567   1271  -1507   2100  -3022    706   1036    867
##    1464  -1601   1005   1507  -4394  -2381   4394   3115   2307   1035
##   -1035   1127   4320   2564  -1080   3478    894  -3665   4210  -1509
##    1509   9990  14985   6315   4028    965  -4518  18300   9156   8420
##    1738   -583  14124   9716   4052   6598  -2637  10525 -11770   2931
##   15056   3844   9164  16478   8660  -2447  37640  45168  48932  -9424
##    6780  -6784   2747   4762  -9416   1929   6218  -3298   1074   1968
##    1555    681  14136   2820   1688   -662  -3115  -1555    942  -5201
##   -2144  -1630   1958   1771  -2931   7904   1564   6046  -3660  -1235
##   -2251  -6213  14497  -6780  -4698   3298    799  25894 101222  75328
##    1957  -2472  35310  -6042   6042  18832   9150  22584 -14124   2251
##    6777   1130    896    978    893   -978  -4896  17430  13568   9036
##   -2731   2323   2421   2226   3488  -2214  -2421   7320   1935   2032
##    7263   3197   4452   3585   4646  -1935   4845  -1858   2130  -2130
##   -2323  -2226  12254  10825   3024  -3024   4583  -4583   5133  -5959
##    5959  -2564  -5133   1648   2472  -3483  -2106   7330  18325   7143
##   13734   5128  10266   3296   5278  -7538]
## =====================================
## Number of unique data for SaleExclGST is 2477
## unique data for SaleExclGST is [ 1368. -1614. -1805. ...  4767. -1512. -3208.]
## =====================================
## Number of unique data for GSTP is 5
## unique data for GSTP is [ 5 17  0  6  9]
## =====================================
## Number of unique data for GST is 758
## unique data for GST is [   68   -81   -90   101   157    81   171   -95    95    90   120   166
##     66   366    63   124   -63    36    32  -726   -76   -83    83   100
##    105    77    76   406    43  -508   508    56   238   109  -238   726
##    152  -166   -56   110  -105   286   119  -100    65   190  -119  -157
##     70    86  -171  -120   285    89  -109   104    61   -50   200   400
##     59   -40   -73   -65   334   -77    60    71    80  -124    85   142
##    150    73   214   107    87    94   -86   -62   111  -111    99    84
##   1451   116  -334    40   181   -68   -43   -60   108   -66   284  -366
##    -84   300   380    47    62    55   143   118   180    45    64    31
##   -406   -59  -118   -45   236   -36   178   177    78    50    67    72
##     39    54    38   156    82   115    97   -78   -47   342    52  -143
##    160   -99   354    42    51    29   248     0    34   113   102   295
##   1593   -31   202  -180    37   -51    93  -101   -96    69   -33   145
##    126   -93   -55   158  -271   162   209   -30   146   617   270   161
##    544   128  -126  -152   125  -150   305   112   371  -286   356   218
##    149   475  -107   347   333  -156   -67   332   -64  -110   513   345
##     74  -190   163    91  -214   -94   -80   188   133  1452    27   -61
##    -49   141   314   247   893  1462   325   731    48   203  -342  -181
##    186   154   486   810  1782  1944  1620   324   650  1624  1787  2112
##    812  1300   487   271    79   498   129   219    30   -71    49   137
##     41   -69   -98   -42    98   -38    57   174  1045   348  1393   264
##     75    26   -54    46   -25   -37  -102  1016   551   210   -52   138
##   -128   117    96    92    88  -104    28   292    58  -314   732   360
##     25    22   217   257   172  2904  2178   481   363   114   176  -176
##   -363   -44   667   -72  -114   144   193   184   346    53    33   435
##    103   308   352   329   653  -300   690   317   243   182    -3   -39
##    542   581   250   249   290  -290  -267   -87   122   267   -91   725
##    272   233  -113  -653  -144   -35   326   311   344   676   -75   -27
##    -32  -193   165  -153   330 -1451   451   228  1306    -9  -581   534
##    254   -70  -284  -356  -233  -333  -254   123   369   -29   135   -58
##    404  -141   -88  -117   240  1083   130   361   499   379   189   106
##   -123   722   246   381  -203   416   -53   140   173   472  -617   -46
##    -48   153   616  -246  -137   428   216   -74   223   299   -57   -79
##   -275   -41  -163  -718  -139  -142  -133   684   -26   -82  -249  1161
##    282   167    -5  1663   582   915  1247  1414  2245   283   237  -169
##    602   132   147   179   159   199  -116   588   175  -240   121   244
##   -158  1077  -200  -326   630   721  -209   599   449  -154  -135   328
##    139    44   289 -2177    -7  1198  3595  -376    -6  2274   697   -20
##    -16    24   -10    -4   -22   -18    -8  -182   -19   304   230  2177
##    274   832  1164  3742  3659   364   197   394   316   -85   303   523
##   -162  -381   457   211   198  -198   226   396   204  -226   131  -147
##   -282   593   261   297   192  -336  -168   168  -130  -125   164   221
##   -106  -103  -146   224  -186  -138  -178   215  -204   586   -89   169
##    337   194   212  -216  -192   407   424  -167   127   206   155  -112
##    429   339  -131   258   322   644   309  -260   418   234   195  -173
##    386  -283   384  -224  -297   148   288    35  -418  -129   136   170
##    336  -127   989   260   452   706  -339  1234   480  -136   312   353
##   -170   565   373   255   296   318   187  -312   235   561  1129   678
##    801   368  -368  -288  -509  -187  -159   395  -250   229   183  1931
##   2236   277  -242   494   554   746  -293  -188   439   225   293  1056
##   1690  4647  2324   845   634   412  -165  -189   437   536   208   262
##    376   355  -355  -262  -255  -149  -437  -305  -177   335  -206   372
##   -121  -202  -272  -115   461   433  -234   331  -536   375   268   251
##   -148  -237   231  -247   259  -184   265  -412   627  -289   899   253
##    445   497   350   568  -371  1647   239   577   276   191   530   398
##   -248  -208   752  -174   594  -461   663   415   185  -424   874   390
##   -259   903   242   709   410   692   425  2033  2439  2642   610  -565
##    220  -277  -211  -122  -215  -140   294   525  -231  -396  -452  -195
##   -108   -97     7    14   205   151     2  -185   280   275  -330  -373
##    298   696   597  -329  -610   427  1088  4252  3164  1483   227  -544
##   -199   791   576  1220   474  -257  -294   509  1046   814  -407  -221
##   -183   510  -269   256   847   735   222  -212   213  -331   462  -462
##   -191   660  1649   450   196   393   351   865   618   323   824   924
##   -220   897]
## =====================================
## Number of unique data for DiscPer is 258
## unique data for DiscPer is [ 0.000e+00  2.500e+01  3.000e+01  1.500e+01  3.500e+01  2.000e+01
##  -4.740e+01  1.000e+01  8.650e+00  8.600e+00  4.000e+01  5.000e+01
##   2.020e+01 -4.750e+01  5.000e+00  1.050e+01 -2.780e+01  6.000e+00
##   8.230e+00  9.100e+00  8.940e+00  2.100e+01  2.765e+01 -4.800e+01
##  -3.000e+01  1.501e+01 -5.350e+01 -1.225e+01  2.870e+01 -2.760e+01
##   2.850e+01  3.500e+00 -1.000e+01  5.100e+01  4.500e+00  3.200e+01
##   3.600e+01 -2.700e+01 -1.420e+01 -1.100e+01  2.400e+01 -1.150e+01
##  -1.900e+01 -2.000e+01  4.800e+00  1.300e+01  9.400e+00 -4.730e+01
##   4.200e+01  8.000e+00  7.900e+00  8.750e+00  9.000e+00  9.300e+00
##   2.630e+01  1.700e+01  1.000e+02  2.800e+00  1.000e+00 -2.750e+01
##   1.900e+01  1.400e+01  6.800e+00  1.200e+01  8.000e-01 -2.775e+01
##   1.250e+01 -1.105e+02 -4.400e+01  2.000e+00 -4.735e+01  1.100e+01
##   1.550e+01  1.340e+01  5.500e+00 -5.000e+00 -7.500e+00 -9.700e+00
##  -1.600e+01 -8.100e+00 -3.150e+01 -3.600e+01  1.900e+00 -4.050e+01
##  -2.550e+01 -1.550e+01 -2.335e+01 -4.430e+01 -1.107e+02 -4.745e+01
##   1.870e+01  1.110e+01  1.590e+01  2.220e+01  2.230e+01  2.198e+01
##  -4.709e+01 -4.700e+01  7.000e+00  2.235e+01  2.240e+01  2.250e+01
##   3.000e+00  1.325e+01  2.225e+01  1.260e+01 -8.500e+01 -3.529e+01
##   4.000e+00  2.080e+01 -3.550e+01 -6.200e+01 -8.300e+01 -1.450e+01
##  -1.200e+01  4.400e+00  1.650e+01  4.600e+01 -2.000e+00  4.300e+00
##   9.500e+00 -2.000e-01  4.900e+00  8.700e+00  5.000e-01  2.700e+00
##  -1.120e+01 -9.000e+01 -7.000e+01 -6.500e+01 -6.000e+01 -3.010e+01
##  -1.500e+01 -1.950e+01 -6.300e+00 -6.500e+00 -6.650e+01 -6.400e+01
##  -6.300e+01 -8.350e+01 -5.500e+00  1.570e+01  1.600e+01 -2.755e+01
##  -5.170e+01 -4.300e+01  8.700e+01  7.000e+01  5.625e+01  2.300e+01
##   3.330e+01  4.600e+00  4.780e+00  6.875e+01  2.310e+01 -1.000e+00
##  -1.100e+00 -1.500e+00 -2.500e+01 -1.000e-01 -2.400e+01 -4.900e+01
##  -1.200e+00 -2.500e+00  1.420e+01 -1.400e+01 -2.800e+01  1.850e+01
##   3.300e+01  1.490e+01  1.950e+01  1.980e+01  1.350e+01  1.450e+01
##   1.750e+01  1.330e+01 -2.830e+01  2.520e+01  2.460e+01  2.800e+01
##   3.475e+01 -1.170e+01  2.880e+01 -2.900e+01 -4.850e+01 -2.350e+01
##   6.750e+01 -1.800e+01  3.300e+00 -3.100e+00  6.640e+01  6.210e+01
##   5.960e+01 -4.550e+01 -2.340e+01 -3.200e+00 -2.420e+01 -2.820e+01
##  -1.300e+01  1.430e+01 -8.000e-01  4.570e+01  3.400e+00 -2.410e+01
##   1.440e+01  4.670e+01  1.410e+01 -4.840e+01  5.350e+01  2.340e+01
##   8.450e+00  5.375e+01  5.340e+01  5.400e+01  5.335e+01  4.450e+01
##   6.300e+01  4.800e+01  5.900e+01  5.500e+01 -1.850e+01  4.380e+01
##   6.410e+01  6.000e+01  4.970e+01  4.510e+01  4.610e+01  6.610e+01
##   6.700e+01  5.370e+01  4.840e+01  4.940e+01  6.500e+01  5.930e+01
##  -1.130e+02  5.860e+01 -5.800e+00 -4.500e+01  6.650e+01  6.660e+01
##   3.590e+01  2.820e+01 -1.650e+01 -1.127e+02 -5.000e+01 -4.830e+01
##   7.800e+01 -8.500e+00 -8.000e+00  1.920e+01  1.910e+01  1.540e+01
##   9.850e+01 -6.750e+00  3.140e+01  5.940e+01  4.750e+01  1.010e+01]
## =====================================
## Number of unique data for DiscAmount is 1000
## unique data for DiscAmount is [ 0.0000e+00  4.5100e+02  7.1300e+02  5.9800e+02  3.1400e+02  9.4100e+02
##   6.3200e+02  4.7500e+02  7.6500e+02  8.6500e+02  1.1360e+03  2.9900e+02
##   4.9900e+02  2.3800e+02  3.9900e+02  2.4700e+02  9.9900e+02  5.4200e+02
##   1.2810e+03  1.1650e+03  2.9400e+02  2.7100e+02  2.0000e+02  1.9600e+02
##   1.0270e+03  1.6700e+02  1.6600e+02  2.0600e+02  2.8500e+02  2.5000e+02
##   5.0000e+02  6.6500e+02  3.5600e+02  8.3200e+02  1.0980e+03  1.1980e+03
##   9.5000e+02  5.6500e+02  1.1300e+03  1.1880e+03 -8.3200e+02  1.9980e+03
##   6.9800e+02  1.2640e+03  6.8700e+02  7.4700e+02 -6.3200e+02  1.4980e+03
##   1.4940e+03  7.3200e+02  8.0700e+02  6.6600e+02  1.5300e+03  1.0670e+03
##  -7.3200e+02 -6.9800e+02 -1.1980e+03  1.7130e+03 -3.5600e+02  4.3700e+02
##   2.4200e+02  6.0000e+02  1.0900e+02  1.5200e+02  5.4600e+02  9.0000e+01
##   5.3700e+02  1.0000e+02  5.1400e+02  6.4000e+02  1.0700e+03  8.5600e+02
##   1.8000e+02  5.9700e+02  4.0400e+02  7.8400e+02  3.2800e+02  3.5000e+02
##   4.2300e+02  1.0800e+02  1.5600e+02  5.2200e+02  3.7100e+02  3.5800e+02
##   5.7000e+02  1.9000e+02  1.6100e+02  1.4400e+02  1.8200e+02  3.2300e+02
##   1.1890e+03  9.1200e+02  3.8700e+02  7.1900e+02  9.2600e+02  1.9860e+03
##   3.2490e+03  7.2200e+02  1.6240e+03  2.3780e+03  1.5690e+03  1.1940e+03
##   9.0200e+02  3.4200e+02  1.0800e+03  1.8000e+03  3.9600e+03  4.3200e+03
##   3.6000e+03  7.2000e+02  1.4440e+03  3.6100e+03  3.9710e+03  4.6930e+03
##   1.8050e+03  2.8880e+03  1.0830e+03 -4.7500e+02  4.5600e+02  3.6100e+02
##   4.8400e+02  6.8500e+02  1.8100e+02  5.9400e+02  2.4000e+02  1.8600e+02
##   1.2400e+02  2.7200e+02  3.5900e+02 -3.5900e+02  2.3900e+02  6.8800e+02
##   2.8600e+02 -6.4900e+02  4.9700e+02  2.4800e+02  6.4900e+02  2.0900e+02
##   5.7300e+02  2.1900e+02  5.2500e+02  4.7100e+02  2.8400e+02  1.3300e+02
##   1.2300e+02  6.4500e+02  4.9300e+02 -1.5200e+02  4.6500e+02  1.1300e+02
##   1.2600e+02  1.2200e+02  7.5800e+02  2.5300e+02  1.0110e+03  8.3600e+02
##   1.3700e+02  2.7400e+02  7.8000e+01  1.7100e+02 -1.8500e+02 -1.8000e+02
##   1.0200e+02  1.1150e+03 -1.1150e+03  9.2000e+01  3.0800e+02  1.6400e+02
##   1.8500e+02  6.8000e+01  1.2100e+02  2.5610e+03  1.0250e+03  3.8000e+02
##   2.0800e+02  2.4500e+02  2.5800e+02  9.8000e+01  1.0300e+02  2.0500e+02
##   4.1800e+02 -4.8000e+02  1.1340e+03  1.1900e+02  7.1700e+02  7.0000e+01
##   1.4000e+02  3.5000e+01 -1.2200e+02  6.5600e+02  4.3000e+02  9.5000e+01
##   6.1000e+01 -9.5000e+01  7.9800e+02  1.5800e+02  1.8300e+02  9.5100e+02
##   2.2200e+02  5.7100e+02  3.3300e+02  1.7700e+02  1.4250e+03  1.7080e+03
##   1.3700e+03  7.8800e+02  3.2000e+02  8.7400e+02  6.8400e+02  2.1500e+02
##  -2.1900e+02  3.0400e+02  4.2700e+02 -1.9600e+02 -5.9800e+02  1.6630e+03
##   1.6640e+03 -7.6500e+02 -8.6500e+02  1.0650e+03 -6.8700e+02  3.4100e+02
##   3.7000e+02  3.1300e+02  5.4000e+02  2.7000e+02  1.5700e+02  2.1300e+02
##   1.0450e+03  2.1340e+03  1.0930e+03  1.2360e+03  6.4200e+02  3.5500e+02
##   5.2800e+02  1.6800e+03  9.0250e+03  7.2200e+03  5.4150e+03  8.5400e+02
##   7.6100e+02  6.2800e+02  1.4260e+03  1.9800e+02  4.9000e+02  2.1400e+03
##   1.7120e+03 -4.2700e+02 -3.6100e+02 -3.4200e+02 -3.9200e+02  7.0400e+02
##  -4.3700e+02  6.0900e+02  3.9200e+02 -3.9900e+02 -9.9000e+01 -7.6000e+01
##   5.3300e+02  5.8900e+02  5.1300e+02  2.2000e+01  6.3000e+02 -1.9000e+02
##  -2.0000e+02  4.6000e+02 -3.3300e+02  3.4300e+02  3.0600e+02 -1.7100e+02
##   2.0400e+02  2.5600e+02  1.0700e+02  2.9000e+02  8.3000e+01  4.2800e+02
##   9.9000e+01 -1.6600e+02 -1.3700e+02 -2.0500e+02 -4.2800e+02  2.7000e+01
##  -5.4200e+02  3.3200e+02  2.0900e+03 -9.5000e+02  7.9000e+02  5.0600e+02
##   6.2700e+02  1.9030e+03  5.8200e+02 -2.3800e+02  1.8920e+03  4.7600e+02
##  -2.0900e+02  3.4600e+02  2.8900e+02  7.1400e+02  4.0900e+02 -8.5400e+02
##   7.8500e+02  4.9400e+02  4.8100e+02  3.4000e+01  4.1900e+02 -7.0500e+02
##   1.1420e+03  1.5800e+03  2.9300e+02  3.3400e+02  1.0390e+03  5.0000e+01
##   1.1970e+03  2.2700e+02 -2.7000e+02  4.7000e+02  3.2700e+02  1.2320e+03
##   7.1200e+02 -3.2700e+02 -6.8500e+02  1.9020e+03  1.5220e+03  1.4080e+03
##   3.3900e+02  3.0700e+02  5.6600e+02  7.2800e+02  4.2000e+02 -7.1300e+02
##   2.8550e+03 -2.8550e+03 -9.0200e+02 -2.1340e+03 -1.6640e+03  1.0400e+03
##   4.6600e+02 -6.4000e+02  8.2000e+02 -8.5600e+02 -7.6100e+02 -4.5600e+02
##  -4.1800e+02 -4.9400e+02  3.5200e+02 -5.1300e+02  2.2400e+02  2.1700e+02
##   1.8800e+02 -2.9900e+02 -2.2200e+02  1.7900e+02  2.8000e+02  2.2900e+02
##   2.6500e+02  6.4000e+01  3.6000e+01  7.2000e+01  3.1900e+02  6.0000e+01
##   1.2800e+02  4.1500e+02  2.3400e+02  4.8500e+02  6.9500e+02  7.2700e+02
##   3.7700e+02  4.3100e+02 -2.7000e+01 -4.3000e+01  6.6400e+02 -3.1400e+02
##   5.5600e+02  4.6100e+02  3.6500e+02 -4.6000e+02  3.6600e+02  3.8800e+02
##  -5.2800e+02 -4.9900e+02  7.2500e+02  4.2100e+02 -4.0900e+02  5.6900e+02
##   3.7500e+02  2.2800e+02 -6.6600e+02  2.6600e+02  1.7600e+03  1.3320e+03
##  -3.8000e+02  1.1040e+03  7.8900e+02  3.7300e+02  3.7900e+02  3.7800e+02
##   9.9800e+02  1.2000e+01  3.9300e+02  1.1290e+03  5.2700e+02  1.0260e+03
##   7.2600e+02  9.7100e+02  6.4700e+02  3.7200e+02  8.8000e+02  1.0950e+03
##   4.3600e+02  1.5400e+02  2.1400e+02  9.9700e+02  1.3200e+02  7.6000e+02
##   9.4700e+02  2.3100e+02  1.1390e+03  3.8500e+02  3.8400e+02  3.1700e+02
##   5.4700e+02  1.3120e+03 -6.5600e+02  1.2000e+02  2.9100e+02  8.5000e+02
##   3.4000e+02  6.1560e+03  1.7100e+03  2.1380e+03  1.0000e+01  3.0000e+01
##   3.4400e+02  2.3500e+02  1.6900e+02  3.4500e+02  5.1900e+02  1.0600e+02
##   2.5000e+01 -7.2700e+02 -6.8800e+02  6.6100e+02  1.0560e+03  3.5190e+03
##   2.4630e+03 -2.8400e+02 -2.4630e+03 -3.1300e+02  6.5400e+02 -1.4940e+03
##   1.4200e+03  8.4000e+02  2.8300e+02  2.6630e+03  7.9900e+03  1.2050e+03
##   1.1000e+03  1.3650e+03  8.5500e+02  2.8200e+02  5.0540e+03  5.0500e+02
##   6.9000e+01  4.5500e+02  3.7600e+02  6.3700e+02  8.5200e+02  2.1200e+02
##  -6.9000e+01  4.5900e+02  6.7000e+01  1.7200e+02  1.4500e+03  1.1550e+03
##   4.0200e+02  3.1000e+01  6.6000e+01  9.4500e+02  3.0000e+00  5.4000e+01
##   7.5000e+01  9.4000e+01 -2.7200e+02  9.0000e+00  3.8300e+02  7.5200e+02
##   1.0240e+03  5.2400e+02  2.6200e+02  2.6300e+02  1.5000e+03  1.4000e+03
##   4.6300e+02  6.9200e+02  1.4800e+02  7.4200e+02  2.9700e+02  1.1100e+02
##   2.6900e+03  6.6900e+02  6.5800e+02  7.3000e+02  8.6700e+02  5.7000e+01
##   7.3000e+01  7.3460e+02  1.2390e+03  1.0310e+03  1.0020e+03  2.4900e+02
##   3.3260e+03  2.4950e+03 -4.3440e+03  5.1200e+02 -2.3100e+03 -6.2500e+02
##  -1.2400e+03 -1.0440e+03  1.2840e+03  8.0800e+02  4.0700e+02  8.0900e+02
##   1.6160e+03 -7.1000e+02  6.1400e+02  4.0000e+01  1.2940e+03  1.7000e+01
##  -3.2300e+02 -6.4700e+02 -5.4000e+02  1.4600e+02  1.5300e+02  2.0000e+00
##   4.9500e+02 -6.6000e+01 -5.9500e+02  1.8900e+02  2.2830e+03 -1.2320e+03
##   8.2400e+02  1.2200e+03  6.2600e+02  5.9300e+02 -5.9300e+02  6.5900e+02
##   3.1100e+02 -1.3830e+03  1.0400e+02  3.9500e+02  4.3300e+02  4.1400e+02
##   3.0300e+02  5.0000e+00  1.1540e+03 -7.2500e+02  3.5300e+02  5.2300e+02
##   3.2500e+02  6.3600e+02  4.0000e+02  1.6960e+03  4.1600e+02  5.6000e+02
##   3.6800e+02  3.3600e+02  2.8800e+02  6.4100e+02  5.7700e+02 -6.9700e+02
##   7.2100e+02  8.7200e+02 -5.1000e+02  3.2000e+01  5.9000e+01  5.3000e+01
##  -2.8100e+02  6.4600e+02 -3.3600e+02 -3.5200e+02  2.6800e+02  1.6480e+03
##   8.0000e+01 -3.1000e+02  5.3600e+02  8.0000e+02 -3.2500e+02  1.6000e+02
##   1.7600e+02  2.9600e+02 -1.2800e+03 -3.5500e+02 -3.0400e+02 -9.3000e+02
##   7.0600e+02 -6.2000e+02  8.5700e+02  8.2000e+01 -4.1460e+03  1.2240e+03
##   5.0900e+02  1.1310e+03 -1.2820e+03 -5.9000e+02  7.4100e+02 -1.7000e+01
##  -2.2000e+01 -1.5000e+01 -2.8200e+02 -1.4000e+01 -1.2000e+01 -1.0000e+01
##  -1.1000e+01 -2.1000e+01  9.6000e+01 -3.2000e+02 -2.4000e+01 -1.8000e+01
##   2.4000e+01 -3.5000e+01 -3.4000e+01 -1.9000e+01 -1.4400e+02 -3.2900e+02
##  -3.0000e+01 -3.2000e+01 -3.8000e+01 -3.0000e+00 -3.1600e+02  4.5000e+01
##  -4.5000e+01 -6.9500e+02 -3.6000e+01  2.5400e+02 -7.4200e+02 -2.9000e+01
##   2.6700e+02 -1.8200e+02 -4.2200e+02  1.9100e+02  2.0700e+02  3.3100e+02
##   2.7700e+02 -2.1500e+02 -2.5600e+02  1.9700e+02  1.9400e+02  2.0200e+02
##   2.2600e+02  3.9000e+01 -2.3000e+01 -2.0200e+02  1.7500e+02 -1.3000e+01
##  -4.3300e+02  7.6000e+01  4.2400e+02  9.4200e+02  4.4000e+02  1.6490e+03
##   1.8850e+03  1.5080e+03 -5.9100e+02  9.8100e+02  3.6700e+02  4.1000e+01
##  -6.7800e+02 -6.7300e+02  1.1870e+03 -6.1900e+02 -6.6200e+02 -7.5400e+02
##  -5.3800e+02 -5.6500e+02  6.9700e+02  8.4800e+02  1.1500e+02 -1.8850e+03
##  -8.5000e+02  8.0100e+02 -9.8600e+02  7.0700e+02  5.2900e+02  7.9200e+02
##   2.3560e+03  1.7430e+03 -7.0700e+02 -7.9000e+02  9.1900e+02  6.7800e+02
##   5.5100e+02 -1.4110e+03  3.3000e+02  3.4900e+02  1.8820e+03  7.5300e+02
##   3.5760e+03  4.1400e+03 -1.0360e+03  1.4840e+03  5.1800e+02  1.8700e+02
##  -5.7500e+02 -2.0700e+02  4.8900e+02  1.1000e+01 -2.6000e+01  1.9000e+01
##   2.6900e+02  6.3900e+02  2.4300e+02  8.4500e+02 -1.9300e+02  1.7800e+02
##  -1.0500e+02 -2.2400e+02 -5.3000e+01  2.0300e+02 -2.0440e+03 -1.9110e+03
##  -2.5460e+03 -1.6660e+03  4.4300e+02 -5.5700e+02  6.2100e+02 -2.4200e+02
##  -1.0900e+02  3.3500e+02 -2.9400e+02 -2.7600e+02 -6.2900e+02  1.3000e+01
##  -9.0000e+00 -2.1200e+02 -1.0900e+03 -1.1010e+03 -1.0000e+00  3.8000e+01
##  -2.0510e+03  2.1000e+01  5.1000e+01 -3.3000e+01 -6.9200e+02 -7.0300e+02
##   1.5000e+02 -5.8800e+02  3.3000e+01  4.6800e+02  3.1000e+02  1.4000e+01
##  -2.7700e+02  1.1960e+03 -1.2570e+03 -9.9700e+02 -2.2100e+02 -5.8100e+02
##   3.0900e+02 -1.2770e+03 -5.5400e+02 -5.5600e+02 -5.3500e+02  1.7000e+02
##  -1.8890e+03 -4.9300e+02  9.3700e+02 -6.7600e+02  1.2720e+03  1.4140e+03
##  -3.4900e+02 -2.3560e+03  3.1060e+03  4.9700e+03  1.3669e+04  6.8340e+03
##   2.4850e+03  1.8640e+03  1.0880e+03  1.0580e+03 -1.7130e+03 -7.3400e+02
##  -2.3990e+03 -1.0280e+03 -8.0800e+02 -1.3730e+03 -4.0400e+02 -1.8840e+03
##  -1.9760e+03 -8.4700e+02  7.9100e+02  3.2960e+03  4.9430e+03  1.4830e+04
##  -1.1860e+03 -1.9100e+02 -2.2600e+02 -1.4830e+04  5.6080e+03 -4.9430e+03
##  -6.5900e+02  7.2900e+02  7.4900e+02 -4.1500e+02  6.1300e+02  1.3940e+03
##   1.6500e+02  7.0500e+02 -1.4500e+03  6.6000e+02  1.0420e+03  5.7600e+02
##   2.2910e+03  2.9790e+03  2.4290e+03  1.6950e+03  1.6030e+03 -1.5100e+03
##   8.4900e+02  8.2600e+02  1.0130e+03  7.7900e+02  5.6100e+02 -1.4150e+03
##   9.7200e+02  2.4980e+03  5.4500e+02  2.1180e+03  4.0300e+02  1.5320e+03
##   1.0600e+03  1.1920e+03  7.1600e+02  9.0700e+02  2.5100e+02 -8.4900e+02
##  -9.1900e+02 -6.3600e+02 -2.7350e+03  3.0640e+03  1.0740e+03  8.5000e+01
##  -5.0800e+02 -1.0150e+03  3.0200e+02 -8.3500e+02  5.6400e+02  3.2100e+02
##  -1.8300e+03 -1.6000e+03 -1.6600e+03  7.3600e+02  4.3400e+02  3.8900e+02
##  -6.8000e+02 -4.6200e+02  7.9000e+01 -1.3570e+03  3.1500e+02  5.8300e+02
##   3.7640e+03  4.5170e+03  4.8930e+03  5.6700e+02  1.8450e+03  1.7480e+03
##  -1.8860e+03  2.2980e+03  6.3000e+01  3.6400e+02  2.1440e+03 -6.5700e+02
##  -1.3180e+03 -5.6300e+02  5.0800e+02  1.2250e+03 -2.3140e+03 -2.9330e+03
##  -3.7300e+02  8.2800e+02  8.6600e+02 -1.0980e+03  2.1000e+02 -1.0340e+03
##  -1.0290e+03 -4.8900e+02 -1.9570e+03 -1.3380e+03 -1.8160e+03 -1.4810e+03
##  -2.3000e+02 -2.3600e+02 -1.4090e+03 -2.6300e+02 -1.9730e+03 -1.3820e+03
##  -1.4600e+03 -1.0770e+03 -1.2140e+03 -1.2410e+03 -2.4040e+03 -1.8910e+03
##  -1.4460e+03 -1.1840e+03 -9.4100e+02 -1.5890e+03 -2.0010e+03 -1.2930e+03
##  -1.0100e+03 -1.6940e+03 -1.6330e+03 -2.3700e+02 -2.2500e+02 -5.2000e+01
##  -3.1100e+02 -4.6600e+02 -1.8770e+03 -1.8620e+03 -3.7600e+02 -3.5400e+02
##  -4.1400e+02 -8.3400e+02 -8.8100e+02 -7.4600e+02 -7.8300e+02 -8.4800e+02
##  -2.0000e+00 -2.6100e+02 -7.4000e+01  1.0460e+03 -2.1240e+03 -1.1490e+03
##  -1.8820e+03 -1.1960e+03 -1.8570e+03  3.7000e+01 -2.2030e+03 -7.3000e+01
##   1.4680e+03  3.6300e+02  6.2200e+02 -1.0600e+02 -1.2600e+02  6.9100e+02
##   2.0400e+03 -1.2400e+02 -1.0700e+02 -6.2100e+02 -8.4500e+02 -1.8100e+03
##  -6.0200e+02 -1.9650e+03 -2.3080e+03  7.6900e+02  3.1600e+02  5.4900e+02
##  -9.0700e+02 -6.8400e+02  1.4120e+03  2.1600e+02  5.8800e+02 -1.8180e+03
##  -1.2790e+03 -1.1750e+03  6.0500e+02  1.4100e+02  1.3560e+03  5.5300e+02
##  -1.9770e+03 -1.7630e+03  7.7000e+01  2.0100e+02  7.3700e+02  4.1300e+02
##   1.0170e+03  5.9900e+02  2.2580e+03 -1.6480e+03 -4.8800e+02 -1.3950e+03
##   4.2380e+03 -1.3940e+03  1.3190e+03 -5.0400e+02 -7.0400e+02  2.2500e+02
##   8.3300e+02  1.2830e+03 -8.3300e+02  1.8840e+03  1.8320e+03 -1.0140e+03
##   5.5000e+02  7.7000e+02  1.3180e+03  1.1860e+03  1.1900e+03  3.5700e+02
##  -7.6700e+02  1.6040e+03  1.1440e+03  8.9700e+02]
## =====================================
## Number of unique data for BarcodeDiscPer is 19
## unique data for BarcodeDiscPer is [ 20   0  15  30  10 -30  50 -20 -50 -10  60 -15 -60 -40  40  25  14 -14
##  -25]
## =====================================
## Number of unique data for BarcodeDiscount is 952
## unique data for BarcodeDiscount is [  342     0   356  1027   570   239   542  -542   712   640   475  -475
##   -713   713   656   855 -1281  1281  1427   484  -484   941 -1427   323
##    950 -1027  1425   304  -427  -807  -627  -323  -656   513  1284   627
##    180   437   209  1712   399 -1712 -1235   380   247  -247   494   418
##    200   807  -342  -855  -513  -570  -239  -418   190 -1807  1900  1236
##   1093  1522   902   616  -712  1569  1807  1330   784  1426  3138  1235
##    798  -494   370  1616 -1616 -1569  1026  -380  -370   427   384   970
##   -616   740  1022 -1022  1780  1855  -827 -1236  1188  1084 -1855  1045
##  -1093 -1084 -1522 -1284   968   913  2375  -913  -180  -200 -3138 -1330
##   -941  1378  -902  -741  -950   360  -209  -399  1334  1804   532  -304
##   1710  -190  -356  1312  1808   400  3232 -1426  -988   988  1186   741
##  -1378  -970   856  -760   760  2052 -1026   783   646 -1188   567   617
##    478 -1186   492   433  -492  3424   361   456  -361   722  -784  2852
##   2139  -456  1140  1083  -722  2186   598   999   717  -598  -717   556
##    589   770   684  -589   460   266   998  -640   831 -1045   613  -998
##   -613   499  -684  -831  1194  1056  2376   313  3168   199  2706  1280
##   -499  -999   926  1996  2054  1113  4278  3252  1626 -1056  2112 -1194
##   3565  1196   470  1482  2388  1662  1254  5280 -2112   711  -770 -1140
##   -856   592  3564 -2139  2090 14260  4991  7843 10695 12121 19251  1368
##    443   257   219   238   171   285   333   352   428   476  -285   213
##   -428  -798  -592   571  -219   714  -257  2472  -238   952 -2054  -352
##   -171  3078  1540   196  -213  -333  -926    85  1711  1483  2624 -1711
##  -1312   666   854   761   332   704  -854  -761 -1483  -437  -704   685
##   1444   874  1760 -1540  3422  1082   541  1142  -332  1408 -1254  4372
##  -2186  3080 -1808 -2090   836  -685  -874  -666  2966  1968  1370  7130
##   9982 32085 31372  1664 -1664   328  2111  2850  1197  2508  5704  2378
##   2854 -1083  4222 -1760  2280  3420  -836  3328 -2852  -328  1068  8556
##   3762 -2111 15686  4750  3800 -5704 -4278  4560 -2280 -1424 -3232 11408
##  -2376 -1197  6333   706   565   650   734   621  1412   932  -706 -1412
##   1036   593  1046   508  1884  -621  1864  2118   268   353   509   523
##   -990  -495   495   367   282   311   296   254   325  -367  -508   466
##   -296  -353  -311   536  -734   989   866  -565  1224  -268  -866 -1046
##   1243   636   790  1299  2486   828   658  1695   715  -715  -790  4522
##  -2261   753 -1243   942  2261  -536  -650  -593  -989  1131 -1131  -282
##   1073  -523  -325  1018 -1884  -480 -1018  1469  1129  1177   678  8424
##   1944  2826  2092   918  1072 -1082   894  2827   480   622  1743   823
##   2354  -942 -1036  1525  4184 -1177  1648  2824  3531   847  2356  3729
##   2258  1430  1355  1732 -1129  -828  2145  1012  1507  1433  2598   810
##    452  -636  1272  -753  -705  -776  1414  1697  1838  1130   705  1468
##    564  -254   800  -658  -823 -1697  1865  -932  -452  2860 -1272   776
##   1506 -1469  2202  1016 -1299  3246  2072  -509  2164  -847  4290   763
##   1922  1242  1639 -1525  2262  1580 -1072 -1012 -2092 -1224 -2824 -2118
##    707  2469   846  2148 -1430 -1414   395 -2145  -707  3768   904 -1790
##    990  -904 -1732  1300  3393  1214   423  -423  -611   611  1101  -964
##   1646   849   979  -979  -678   919  1271  2134  5885  2120  -894  4708
##   1696  1554  2828 -1648  1976  3486  3296   981  1066 -2134 -1743 -1271
##   -981  -537  -800  1788  1882  1318   752 -2354   588   959  3108   733
##  -2356 -2120  6216  3392  4240  5180  1694  2964  4940  5088  2823   176
##  -1696   148  4144  1764  1176   292  -919  2448 -1882  3394  1017 -2072
##    612  1356  -564  3764  -846  -466  1602  1790  1366  -763   848  6587
##   4705 -1602 -2823  -634   634  1978 -1695  2938   503  1212 -1212  1120
##    923  -503  1399   643   489   980  3530  3956  2352  1608 -1017 -1507
##    339  1410  2562 -1554  2544  5646   687   604   806  1098  -604   659
##    632   870  1007   549  -659 -1603   680  -687  1144   256   641   269
##   -632   320  -641 -1007  2291  1650  1208  1511  1603   462  -462  1237
##   1052 -1144  -269   915  1319  1394  1357   577 -1098   907   962 -1130
##    376   414  -320 -1052  -870   697  -433  -622  1374  1832  -848  -697
##    732  -577   824  -962  -594  -907  -806 -1237  -680  -549   961   834
##   2061 -1374 -1208  -414  2260  1787   842  1209  1190  1429  1263  1375
##   -824 -2288  2288  -915  -376  -842   769  3432  3576  2682  1885   879
##   1464  1053  2979   686   916  1244 -1244  1833  2788  1373  -769 -1885
##   -879  -916 -1511  1043   525  -588 -1394 -1650   892  2383  2035  3676
##    648  2196  4712 -1758  1512 -1429 -2035   594  1558 -2146  1153 -1833
##   2749   540  1282  4710  3064  1740 -3064 -1832  1647  -961  1264   308
##   -256  -754   754 -2291  2484 -2062   512  2014   449 -1356  2526  2748
##   2104 -1319  2528  2086  1896  -449 -1043   471  -308  4236  1360 -2828
##  -1242  1863 -1375  -834  -791  3160  2967  1508 -2826 -4710  -749   749
##   2036  1466 -1153  1154  4942  1504  -489 -1190   768  3672 -2036  2714
##   1656  2638 -1978  2858 -3768   964 -1263  1316 -1357 -1649  1181  2062
##    778  4242  2416 -1558  1100  2034  -778   434  3390  1830  2259   984
##   1560  -643   531  3952   782 -2196  -752 -1788  -512 -1126   933 -1318
##   3770  2898  2070  1998 -2348  1649  7766 30358 22592   978  1974 10590
##   1684  5648  2745  1126   349   207   500  -207  -349  -339  2451   226
##    188 -2827 -1355  7062 -1838   245  -471   330  2712 -1366 -2164  4328
##   2372  1059 -2451  2488  8239  3464   235   791  1162   726 -1328 -1210
##   2026  1498   580  1210   813  3630  1279   890  1076  2324  -580  1454
##  -1209  4520 -1115   639  -639 -1162  -890 -3531  2144  2370   597  2032
##   4330  2825  1320  -732  2292  3624  1814  4576 -1814 -1830  1979  2199
##  -1742  -933   522  -522  -714 -1373  2142  1428  1044  4122  1538  2384
##  -1053  3664 -1320 -1512]
## =====================================
## Number of unique data for NetAmount is 2176
## unique data for NetAmount is [ 1436. -1695. -1895. ...  2515. -2660.  6175.]
## =====================================
## Number of unique data for PointsEarned is 139
## unique data for PointsEarned is [   0   36   17   35  -17  -36   20   50   26   23   22   32   16    9
##    25   19   12   21   40   13   33   11   45   18    7   31    5    6
##    -5   15  -25    8   14  -22  -20  -16   60   28  100  -23  -50  -14
##   -13   10   38  -28  200  150   42   52  -35  -26  -18  -19  -21  -40
##    30   24  -12  -15   46  -33   75   37   72   29   80  -32   44   27
##   -10  -37   48   57   70  105   87  349  122  192  262  297  471   41
##   -11   54  -70   74  -30  -27   -9  -31  -29   58  -34  175  244  786
##   768   -7   -8   34  -45   51   85   55  -24  -42   65   63   56  187
##   299  821  411  149  112   53  -74   43  -43  -53   69  -65  751  559
##    49  111 -100  -66   66  140   90  215  108   39  120   84  106]
## =====================================
## Number of unique data for TaxPer is 2
## unique data for TaxPer is [6 9]
## =====================================
## Number of unique data for Cobrand Acc is 80
## unique data for Cobrand Acc is ['F/S LICENSE                  ' 'F/S LUXER PLAIN '
##  'F/S CAMBRIDGE EXECUTIVE ' 'F/S PRINCIPLE PLAIN    '
##  'F/S PRINCIPLE CLASSIC YARN DYED' 'F/S LUXER YARN DYED '
##  'F/S PRINCIPLE SWAN ' 'F/S CAMBRIDGE CASUAL' 'PORT FOLIO (SHADES) F/S '
##  'PORT FOLIO F/S  WCB      ' 'F/S PORT FOLIO SHIRT YARN DYED'
##  'F/S NO IRON EVER' 'F/S DENIM ' 'F/S LUXER PLAIN WCB '
##  'F/S CAMBRIDGE SINCE 1958' 'F/S ARISTO ' 'H/S EXECUTIVE'
##  'H/S PRINCIPLE CLASSIC ' 'F/S ARISTO MASON' 'OVERDYE F/S ' 'H/S LUXER'
##  'F/S TOMORROW          ' 'LICENSE H/S              '
##  'F/S PRINCIPLE POPLIN LUXE MILANO' 'PRINTED F/S SHIRTS'
##  'F/S PRINCIPLE Y/DYED LUXE MILANO ' 'H/S PORT FOLIO Y/D '
##  'D STUDIO  DESIGNERS SHIRT' 'OXFORD H/S' ' H/S TOMORROW'
##  'F/S AFTER HOURS         ' 'CHEMBREY F/S ' 'F/S PRINCIPLE OXFORD  '
##  'CHEMBREY H/S' 'AGE OF WISDOM F/S ' 'H/S PORT FOLIO WCB       '
##  'PRIVILEGE F/S ' 'SEERSUCKER F/S SHIRT' 'LIGHT WEIGHT H/S '
##  'LIGHT WEIGHT F/S ' 'PORTFOLIO SATEEN F/S ' 'F/S DEAD SHIRT ' 'ITALIAN'
##  'H/S AFTER HOURS           ' 'SEERSUCKER H/S SHIRT'
##  'LICENSE                  ' 'OXFORD LICENSE F/S' 'HERRING BONE F/S'
##  'MELANGE YARN DYED F/S ' 'FLANNEL F/S ' 'F/S SHARP CAMBRDIGE '
##  'CAMBRIDGE SHIRT' 'DOBBY F/S' 'Cotton Linen F/S '
##  'F/S ESSENTIALS MENS FORMAL SHIRTS' 'Cotton Slub F/S'
##  'OXFORD LICENSE H/S' 'F/S ACTIVE' 'PERSONALLY CAMBRIDGE'
##  'F/S D STUDIO              ' 'F/S ZERO TOLERANCE       ' 'F/S DOBBY'
##  'F/S HERRING BONE ' 'F/S MELANGE YARN DYED ' 'F/S LICENSE           '
##  'H/S LICENSE         ' 'H/S OXFORD LICENSE ' 'F/S OVERDYE '
##  'F/S OXFORD LICENSE ' 'F/S COTTON LINEN ' 'F/S PRINTED SHIRTS'
##  'H/S SEERSUCKER SHIRT' 'F/S CHEMBREY' 'H/S CHEMBREY ' 'F/S COTTON SLUB '
##  'F/S SEERSUCKER SHIRT' 'F/S FLANNEL' 'F/S AGE OF WISDOM '
##  'CAMBRIDGE UNIFORM ' 'F/S LIGHT WEIGHT ']
## =====================================

4.2 Step 8: Check the null values of the data

Since there are either repetitive values or nans so we are dropping Attribute 4 column

percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'percent_missing': percent_missing})

missing_value_df
##                    percent_missing
## BillNo                    0.000000
## BillDate                  0.000000
## LoyaltyCard              94.967780
## Customer                  1.272083
## Description              90.148370
## BillMonth                 0.000000
## Warehouse                 0.000000
## RegionName                0.000000
## Location                  0.000000
## Category                  0.000000
## DepartmentName            0.000000
## BrandName                 0.231613
## CoBrand                   0.000000
## Barcode                   0.000000
## DesignNo                  0.000000
## Rejection                 0.000000
## SeasonName                0.000000
## Attribute1                0.003731
## Attribute2               78.640674
## Attribute3               53.847141
## Attribute4                0.000000
## Attribute5               49.659594
## Attribute6              100.000000
## Attribute7              100.000000
## Attribute8               26.839700
## LocalImport               0.000000
## Color                     0.000000
## Sizes                     0.000000
## DiscountType             24.951573
## SalesmanName              0.000000
## Qty                       0.000000
## SalesReturnReason        97.793404
## Price                     0.000000
## Amount                    0.000000
## SaleExclGST               0.000000
## GSTP                      0.000000
## GST                       0.000000
## DiscPer                   0.000000
## DiscAmount                0.000000
## BarcodeDiscPer            0.000000
## BarcodeDiscount           0.000000
## NetAmount                 0.000000
## PointsEarned              0.000000
## TaxPer                    0.000000
## Cobrand Acc               0.000000
msno.matrix(df)

Columns Attribute 6 and 7 have 100% missing values. There is no description available for Attribute5 column so all of these columns are dropped.

df.drop(['Attribute6','Attribute7'],axis=1,inplace=True)

5 EDA of the project

5.1 Step 9: Dropping useless columns

df.Attribute4.unique()
## array(['CAMBRIDGE'], dtype=object)

Column Salemanname is a useless feature since we are not interested to identify the sales by each person to avoid bias in decision making.


df.drop('Attribute4',axis=1,inplace=True)
df.drop('SalesmanName',axis=1,inplace=True)
df.drop('Category',axis=1,inplace=True)

The data between columns CoBrand and CoBrand Acc is 99% similar so we are keeping only column out of it.

from fuzzywuzzy import fuzz

fuzz.token_sort_ratio(df['CoBrand'], df['Cobrand Acc'])
df.drop('CoBrand',axis=1,inplace=True)
df.drop('Barcode',axis=1,inplace=True)

6 EDA of the project

6.1 Data Preprocessing

6.1.1 Renaming some columns

Based on the data description, we are renaming some columns to make it more readable

  • Attribute1 is the Inventory_status which shows the cloth which come back to ware house and go back for sales
  • Attribute2 is the offer ctypes at the shops.
  • Attribute8 is the type of the cloth.
  • Attribute3 is the import from which cloth is imported.
df = df.rename(columns={'Attribute1':'Inventory_status',
'Attribute2':'Offers','Attribute8':'Class_of_cloth',
'Attribute3':'Import_type'})

6.1.2 For customer column

  • Using returning and non-returning customers.

  • We identify the customers by names and assume that customers with identical names is actually a returning customer.

  • We will look in graphs if the returning customers are more profitable than the non-returning customers.

df['Customer']=df.Customer.duplicated()
df['Customer'].replace(True,'Returning_Costumer',inplace=True) #<<
df['Customer'].replace(False,'NoN_Returning_Costumer',inplace=True) #<<
df.Customer.value_counts()
## Returning_Costumer        596405
## NoN_Returning_Costumer     73677
## Name: Customer, dtype: int64
  • For loyalty card column. We replaced NaNs with No and Yes for the rest of the values.
  • LoyaltyCard is the actually card offered to customers to get discounts and offers.
df['LoyaltyCard']=df.LoyaltyCard.duplicated()
df['LoyaltyCard'].replace(True,'Yes',inplace=True)
df['LoyaltyCard'].replace(False,'No',inplace=True)

7 EDA of the project

7.0.1 Column Imputing by sklearn Imputing

We used the sklearn imputer to impute the missing values in the data. We used the most frequent value to impute the missing values in the data.

imp = SimpleImputer(strategy="most_frequent")
df['Inventory_status']=imp.fit_transform(df[['Inventory_status']])

imp = SimpleImputer(strategy="most_frequent") #<<
df['Import_type']=imp.fit_transform(df[['Import_type']])

df['Offers']=imp.fit_transform(df[['Offers']])
df['Attribute5']=imp.fit_transform(df[['Attribute5']])
df['BrandName']=imp.fit_transform(df[['BrandName']])
df['Description']=imp.fit_transform(df[['Description']])
df['Inventory_status']=imp.fit_transform(df[['Inventory_status']])

df.SalesReturnReason = df.SalesReturnReason.replace(np.nan,'No information available')
df.Class_of_cloth = df.Class_of_cloth.replace(np.nan, "No information") # replacing Nan values with 0 value

df.DiscountType = df.DiscountType.replace(np.nan, 'No Discount') # replacing Nan values with Most  frequent value

8 EDA of the project

8.1 Summary of the data

df.describe().T
##                     count         mean          std  ...     50%     75%       max
## Qty              670082.0     0.899518     0.611418  ...     1.0     1.0      45.0
## Price            670082.0  2437.725566   714.776793  ...  2259.0  2376.0    6127.0
## Amount           670082.0  2173.785338  1621.785795  ...  2259.0  2376.0  106920.0
## SaleExclGST      670082.0  1740.625980  1489.339984  ...  1663.0  2186.0   77455.0
## GSTP             670082.0     6.092880     2.285465  ...     6.0     6.0      17.0
## GST              670082.0   109.105263   120.435080  ...    95.0   125.0    4647.0
## DiscPer          670082.0     0.382138     3.200854  ...     0.0     0.0     100.0
## DiscAmount       670082.0     8.756731    95.321598  ...     0.0     0.0   14830.0
## BarcodeDiscPer   670082.0    17.706990    20.796915  ...    20.0    30.0      60.0
## BarcodeDiscount  670082.0   423.838609   536.809144  ...   484.0   713.0   32085.0
## NetAmount        670082.0  1849.731244  1581.756942  ...  1747.0  2295.0   82102.0
## PointsEarned     670082.0     1.078650     6.030229  ...     0.0     0.0     821.0
## TaxPer           670082.0     6.359038     0.973759  ...     6.0     6.0       9.0
## 
## [13 rows x 8 columns]

9 Data Transformation

Now we will treat the data with outliers and apply some statistical tests

There are many outliers in numerical columns which can be shown by boxplot. We will use the IQR method to remove the outliers.


10 Data Transformation

# Imputing outliers with mean
def impute_outliers_IQR_with_mean(df):
   Q1=df.quantile(0.25)
   Q3=df.quantile(0.75)
   IQR=Q3-Q1
    # Lower bound
   lower = Q1 - 1.5*IQR
    # Upper bound
   upper = Q3 + 1.5*IQR
   df = np.where(df > upper,
       df.mean(),
       np.where(
           df < lower,
           df.mean(), #<<
           df
           )
       )
   return df

10.0.1 Checking skewness of the data

# Library for skewness
from scipy.stats import skew
print('Skew of Price',skew(df.Price))
## Skew of Price 1.7332960811793
print('Skew of Amount',skew(df.Amount))
## Skew of Amount 0.5264229479147174
print('Skew of SaleExclGST',skew(df.SaleExclGST))
## Skew of SaleExclGST 0.053907402377267245
print('Skew of GSTP',skew(df.GSTP))
## Skew of GSTP 3.422663381156023
print('Skew of GST',skew(df.GST))
## Skew of GST 1.8886742450354679
print('Skew of BarcodeDiscPer',skew(df.BarcodeDiscPer))
## Skew of BarcodeDiscPer -0.11344278021015818
print('Skew of BarcodeDiscount',skew(df.BarcodeDiscount))
## Skew of BarcodeDiscount 1.639017468026673
print('Skew of NetAmount',skew(df.NetAmount))
## Skew of NetAmount 0.11210500286062883
print('Skew of PointsEarned',skew(df.PointsEarned))
## Skew of PointsEarned 21.43903163597488
cols=df.select_dtypes(include='number')
cat_cols = cols
i=0
while i < 10:
    fig = plt.figure(figsize=[25,4])
 
    plt.subplot(1,3,1)
    sns.distplot(a=cat_cols.iloc[:,i], hist=True)
    i += 1
    
    plt.subplot(1,3,2)
    sns.distplot(a=cat_cols.iloc[:,i], hist=True)
    i += 1
    
    plt.show()


11 Data Transformation

Now by imputing right skewed by median and left skewed ‘Price’ by mean we will remove the outliers.

df['Qty'] = impute_outliers_IQR_with_median(df['Qty'])
df['Price'] = impute_outliers_IQR_with_mean(df['Price'])
df['Amount'] = impute_outliers_IQR_with_median(df['Amount']) #<<
df['SaleExclGST'] = impute_outliers_IQR_with_median(df['SaleExclGST'])
df['GSTP'] = impute_outliers_IQR_with_median(df['GSTP'])
df['GST'] = impute_outliers_IQR_with_median(df['GST'])
df['BarcodeDiscPer'] = impute_outliers_IQR_with_mean(df['BarcodeDiscPer'])
df['BarcodeDiscount'] = impute_outliers_IQR_with_median(df['BarcodeDiscount'])
df['NetAmount'] = impute_outliers_IQR_with_mean(df['NetAmount'])
df['PointsEarned'] = impute_outliers_IQR_with_median(df['PointsEarned'])
df['DiscPer'] = impute_outliers_IQR_with_median(df['DiscPer'])
df['DiscAmount'] = impute_outliers_IQR_with_median(df['DiscAmount'])

12 Data Transformation

We can apply the shapiro wilk test to check if the data is::: nonincremental

# Shapiro wilk test
from scipy.stats import shapiro #<<
print('ShapiroTest of Price',shapiro(df.Price))
## ShapiroTest of Price ShapiroResult(statistic=0.8969478011131287, pvalue=0.0)
## 
## /opt/anaconda3/lib/python3.9/site-packages/scipy/stats/morestats.py:1760: UserWarning: p-value may not be accurate for N > 5000.
##   warnings.warn("p-value may not be accurate for N > 5000.")
print('ShapiroTest of Amount',shapiro(df.Amount))
## ShapiroTest of Amount ShapiroResult(statistic=0.908415675163269, pvalue=0.0)
print('ShapiroTest of SaleExclGST',shapiro(df.SaleExclGST))
## ShapiroTest of SaleExclGST ShapiroResult(statistic=0.8960244059562683, pvalue=0.0)
print('ShapiroTest of GSTP',shapiro(df.GSTP))
## ShapiroTest of GSTP ShapiroResult(statistic=0.6320035457611084, pvalue=0.0)
print('ShapiroTest of GST',shapiro(df.GST))
## ShapiroTest of GST ShapiroResult(statistic=0.8931977152824402, pvalue=0.0)
print('ShapiroTest of BarcodeDiscPer',shapiro(df.BarcodeDiscPer))
## ShapiroTest of BarcodeDiscPer ShapiroResult(statistic=0.8688030242919922, pvalue=0.0)
print('ShapiroTest of BarcodeDiscount',shapiro(df.BarcodeDiscount))
## ShapiroTest of BarcodeDiscount ShapiroResult(statistic=0.8954393863677979, pvalue=0.0)
print('ShapiroTest of NetAmount',shapiro(df.NetAmount))
## ShapiroTest of NetAmount ShapiroResult(statistic=0.9133129715919495, pvalue=0.0)
print('ShapiroTest of PointsEarned',shapiro(df.PointsEarned))
## ShapiroTest of PointsEarned ShapiroResult(statistic=1.0, pvalue=1.0)
## 
## /opt/anaconda3/lib/python3.9/site-packages/scipy/stats/morestats.py:1757: UserWarning: Input data for shapiro has range zero. The results may not be accurate.
##   warnings.warn("Input data for shapiro has range zero. The results "

13 Data Transformation

13.1 Step_3 Normalization of the numeric columns.

13.1.1 Methods of Normalization

Four common normalization techniques may be useful:

  • scaling to a range(min-max)
  • clipping
  • log scaling
  • z-score

We use the standard scaler to normalize the data.


from sklearn.preprocessing import StandardScaler
scaler = StandardScaler() #<<
df[['Qty','Price','Amount','SaleExclGST','GSTP','GST','BarcodeDiscPer','BarcodeDiscount','NetAmount','PointsEarned','DiscPer','DiscAmount']] = scaler.fit_transform(df[['Qty','Price','Amount','SaleExclGST','GSTP','GST','BarcodeDiscPer','BarcodeDiscount','NetAmount','PointsEarned','DiscPer','DiscAmount']])

Checking the normality after normalization

from scipy.stats import shapiro
print('ShapiroTest of Price',shapiro(df.Price))
## ShapiroTest of Price ShapiroResult(statistic=0.8945704102516174, pvalue=0.0)
## 
## /opt/anaconda3/lib/python3.9/site-packages/scipy/stats/morestats.py:1760: UserWarning: p-value may not be accurate for N > 5000.
##   warnings.warn("p-value may not be accurate for N > 5000.")
print('ShapiroTest of Amount',shapiro(df.Amount))
## ShapiroTest of Amount ShapiroResult(statistic=0.9033786058425903, pvalue=0.0)
print('ShapiroTest of SaleExclGST',shapiro(df.SaleExclGST))
## ShapiroTest of SaleExclGST ShapiroResult(statistic=0.8960385918617249, pvalue=0.0)
print('ShapiroTest of GSTP',shapiro(df.GSTP))
## ShapiroTest of GSTP ShapiroResult(statistic=0.6283426880836487, pvalue=0.0)
print('ShapiroTest of GST',shapiro(df.GST))
## ShapiroTest of GST ShapiroResult(statistic=0.8964444398880005, pvalue=0.0)
print('ShapiroTest of BarcodeDiscPer',shapiro(df.BarcodeDiscPer))
## ShapiroTest of BarcodeDiscPer ShapiroResult(statistic=0.8605928421020508, pvalue=0.0)
print('ShapiroTest of BarcodeDiscount',shapiro(df.BarcodeDiscount))
## ShapiroTest of BarcodeDiscount ShapiroResult(statistic=0.8955459594726562, pvalue=0.0)
print('ShapiroTest of NetAmount',shapiro(df.NetAmount))
## ShapiroTest of NetAmount ShapiroResult(statistic=0.9149541854858398, pvalue=0.0)
print('ShapiroTest of PointsEarned',shapiro(df.PointsEarned))
## ShapiroTest of PointsEarned ShapiroResult(statistic=1.0, pvalue=1.0)
## 
## /opt/anaconda3/lib/python3.9/site-packages/scipy/stats/morestats.py:1757: UserWarning: Input data for shapiro has range zero. The results may not be accurate.
##   warnings.warn("Input data for shapiro has range zero. The results "
print('ShapiroTest of DiscPer',shapiro(df.DiscPer))
## ShapiroTest of DiscPer ShapiroResult(statistic=1.0, pvalue=1.0)
print('ShapiroTest of DiscAmount',shapiro(df.DiscAmount))
## ShapiroTest of DiscAmount ShapiroResult(statistic=1.0, pvalue=1.0)

14 Applying Statistical tests

## Ttest_indResult(statistic=0.6562095394492546, pvalue=0.5116895372324733)

#image

summary, results = rp.ttest(group1= df_karachi['Price'], group1_name= "Karachi",
                            group2= df_lahore['Price'], group2_name= "Lahore")
results

15 Graphical Relations between Variables

df1=pd.read_csv('/Users/snawaz/Documents/pychilla2/teamproject_sep3/Deep_note_linked/cleaned_data.csv')

16 Machine learning Model Building

Dependent variables in various ML algorithms will be * Price for linear regression where other select features will be independent variable. * Price for multilenar regression where all other variables will be independent. * Inventory_status and LocalImport where all other are independent variables in classification ML model. * Price in Deep learning model * 2 Clustering Models for SeasonName.

16.1 Step_1: Encoding the categorical columns

  • We will use One_Hot encoding for 2 variables Attribute8 which is actually the class of the sales and Local Import
One_hot_encoded_data = df_Ml[['Offers']]

enc = OneHotEncoder()
enc_results = enc.fit_transform(One_hot_encoded_data)


enc=pd.DataFrame(enc_results.toarray(), columns=enc.categories_)

17 Classification Models

  • Target feature: Inventory_status & LocalImport
  • Independent features: All other features obrained after CV and feature selection

List of Libraries used in this all Machine learning section

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno   
import plotly.express as px
import scipy.special
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show
import scipy.stats as stats
import xgboost as xgb
import descartes
import plotly.express as px
import folium
import sklearn

## ML libraries
from xgboost import XGBRegressor
from sklearn.linear_model import LogisticRegression,LinearRegression,RidgeCV,Lasso,LassoCV
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.ensemble import ExtraTreesClassifier,RandomForestRegressor,RandomForestClassifier,GradientBoostingClassifier
from sklearn.model_selection import train_test_split,learning_curve, cross_val_predict,cross_validate,cross_val_score,KFold
from xgboost import XGBClassifier
from sklearn.tree import DecisionTreeClassifier,DecisionTreeRegressor
from sklearn.svm import SVR,LinearSVR,SVC
from sklearn.feature_selection import RFECV , RFE
from time import time
from lightgbm import LGBMRegressor
from sklearn.metrics import pairwise_distances
from sklearn.cluster import KMeans
from sklearn.model_selection import cross_val_score,RepeatedStratifiedKFold,KFold
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression,OrthogonalMatchingPursuit
from sklearn.ensemble import StackingRegressor,ExtraTreesRegressor,RandomForestRegressor,GradientBoostingRegressor,AdaBoostRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import train_test_split, GridSearchCV,RepeatedKFold,RepeatedStratifiedKFold
from sklearn.metrics import f1_score, classification_report, accuracy_score, mean_squared_error, precision_score
from sklearn.metrics.cluster import adjusted_rand_score,contingency_matrix
from numpy import unique
from numpy import where
from sklearn.cluster import DBSCAN,KMeans,MiniBatchKMeans
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_samples, silhouette_score
from yellowbrick.cluster import SilhouetteVisualizer

# libraries of DL 
from time import time
from sklearn.multioutput import MultiOutputRegressor
from sklearn.datasets import make_regression
from keras.models import Sequential
from keras.layers import Dense
from numpy import asarray
from pandas import set_option
from sklearn.pipeline import Pipeline

#scores
from sklearn.metrics import precision_recall_curve,confusion_matrix,mean_squared_error,mean_absolute_error,explained_variance_score,max_error,r2_score,median_absolute_error,mean_squared_log_error,silhouette_score

18 Classification Models

18.1 Local Import as target feature

18.1.1 Step 1: Subset of large dataset

  • We selected a sample of 10000 rows for comparison of different classification model.
df_M = df_Ml.sample(10000).reset_index(drop=True)

18.1.2 Step2 Test train split the data

  • Data is splitted before hand with 30% test data and 70% train data.
X=df_M.drop('LocalImport',axis=1)
y=df_M['LocalImport']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=44, shuffle =True)

18.1.3 Step 3: Make a list of classification models we want to apply

final_clf = None
clf_names = ["Logistic Regression", "KNN(3)", "XGBoost Classifier", "Random forest classifier", "Decision Tree Classifier",
            "Gradient Boosting Classifier", "Support Vector Machine"]

18.1.4 Step4: Apply each model on 10000 rows dataset and get the scores

classifiers = []
scores = []
for i in range(10):
    
    tempscores = []
    
    # logistic Regression
    lr_clf = LogisticRegression(n_jobs=-1)
    lr_clf.fit(X_train, y_train)
    tempscores.append((lr_clf.score(X_test, y_test))*100)
    
    # KNN n_neighbors = 3
    knn3_clf = KNeighborsClassifier(n_jobs=-1)
    knn3_clf.fit(X_train, y_train)
    tempscores.append((knn3_clf.score(X_test, y_test))*100)

    # XGBoost
    xgbc = XGBClassifier(n_jobs=-1,seed=41)
    xgbc.fit(X_train, y_train)
    tempscores.append((xgbc.score(X_test, y_test))*100)

    # Random Forest
    rf_clf = RandomForestClassifier(n_jobs=-1)
    rf_clf.fit(X_train, y_train)
    tempscores.append((rf_clf.score(X_test, y_test))*100)

    # Decision Tree
    dt_clf = DecisionTreeClassifier()
    dt_clf.fit(X_train, y_train)
    tempscores.append((dt_clf.score(X_test, y_test))*100)

    # Gradient Boosting 
    gb_clf = GradientBoostingClassifier()
    gb_clf.fit(X_train, y_train)
    tempscores.append((gb_clf.score(X_test, y_test))*100)
    
    #SVM
    svm_clf = SVC(gamma = "scale")
    svm_clf.fit(X_train, y_train)
    tempscores.append((svm_clf.score(X_test, y_test))*100)
    
    scores.append(tempscores)

scores = np.array(scores)
clfs = pd.DataFrame({"Classifier":clf_names})
for i in range(len(scores)):
    clfs['iteration' + str(i)] = scores[i].T

means = clfs.mean(axis = 1)
means = means.values.tolist()

clfs["Average"] = means


clfs.set_index("Classifier", inplace = True)
print("Accuracies : ")
clfs["Average"].head(10)

image

  • At this stage we have got the best classification with highest accuracy for getting prediction of LocalImport variable.

18.1.5 Step 5: Selecting full data for feature selection

X=df_M.drop('LocalImport',axis=1)
y=df_M['LocalImport']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=44, shuffle =True)

print('X_train shape is ' , X_train.shape)
print('X_test shape is ' , X_test.shape)
print('y_train shape is ' , y_train.shape)
print('y_test shape is ' , y_test.shape)
  • Now comes the best feature selection part. We will use RFE to select the best features for each model.

18.1.6 Step 6: Select and plot best features byfeatures_importance

model = GradientBoostingClassifier()
model.fit(X,y)
plt.style.use('ggplot')
plt.figure(figsize=(6,10))
feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(50).plot(kind='barh')
plt.savefig("extra_tree.png",dpi=200)
plt.show()

image

Only 7 features are good enough out of 38 to train our final model. so we drop rest of the features below.

df_M = df_Ml[['BrandName','Attribute5','GSTP','SeasonName','BillMonth','Price','DesignNo','Import_type','LocalImport']]

We could also select pearson corelation coefficient values by corelation chart below.

18.1.7 Step 7: Hyperparameters selection

Making a dictionary of the hyperparamters for Gradient boost model. We will apply grid cross validation approach to select best hyperparamters for our selected model.

#Creating Parameters
params = {
    'learning_rate':[0.1,1],
    'n_estimators':[5,9, 11,12,13,15,20,25,26,29,31,20,50,75,100],
    'max_features':['auto','sqrt','log2'],
    'criterion':['friedman_mse', 'squared_error', 'mse'],
    'loss':['log_loss', 'deviance', 'exponential']
}


#Fitting the model

from sklearn.model_selection import GridSearchCV

rf = GradientBoostingClassifier()
grid = GridSearchCV(rf, params, cv=3, scoring='accuracy')
grid.fit(X_train, y_train)
print(grid.best_params_)
print("Accuracy:"+ str(grid.best_score_))

{‘criterion’: ‘friedman_mse’, ‘learning_rate’: 1, ‘loss’: ‘exponential’, ‘max_features’: ‘auto’, ‘n_estimators’: 75} Accuracy:1.0

  • It gives us the best hyperpatamters at scoring paramter chosen as accuracy.

  • Remember model will remain the same with similar X_train, y_train data.

18.1.8 Step 8: Final application of model

# applying model with best hyperparameters

rf = GradientBoostingClassifier(criterion='friedman_mse', learning_rate=1, loss='exponential', max_features='auto', n_estimators=75)

rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)

# from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

print("Accuracy Score: ", accuracy_score(y_test, y_pred))

print("Confusion Matrix: ", confusion_matrix(y_test, y_pred)) 

Accuracy Score: 1.0 Confusion Matrix: [[ 9895 0] [ 0 191130]]

  • We have got 100% accuracy for our final model.

18.1.9 Step9: Model scores and evaluation

from sklearn.naive_bayes import GaussianNB
from yellowbrick.classifier import ClassificationReport

# Instantiate the classification model and visualizer

visualizer = ClassificationReport(rf)

visualizer.fit(X_train, y_train)  # Fit the visualizer and the model
visualizer.score(X_test, y_test)  # Evaluate the model on the test data
visualizer.show() 

image

18.1.10 Step 10: Saving the model to get prediction leather

import pickle
pkl_filename = "localImport_model.pkl"
with open(pkl_filename, 'wb') as file:
    pickle.dump(grid, file)

# Load from file
with open(pkl_filename, 'rb') as file:
    pickle_model = pickle.load(file)

# Calculate the accuracy score and predict target values
score = pickle_model.score(X_test, y_test)


y_predict = pickle_model.predict(X_test)

19 Classification Models

19.1 Inventory_status as target feature.

Best Params: {‘criterion’: ‘entropy’, ‘max_features’: None, ‘n_estimators’: 75, ‘random_state’: 1} Train MSE: 0.0 Test MSE: 0.146

modelkn = RandomForestClassifier(random_state=1,criterion= 'entropy' ,n_jobs=-1, n_estimators=75, max_features=None) 

modelkn= modelkn.fit(X_train,y_train)

y_11 = modelkn.predict(X_test)

Model scores for inventory_status

image

Confusion matrix for inventory_status

image


20 Regression Models

20.1 Price and SaleExclGST as target features.

  • Target features: Price of article. & SaleExclGST gained by company +NetAmount paid by customer.

  • In one regression model there is only 1 target feature while in the other there are two target features.

  • Only difference from the previous approach is the use of RepeatedKFold cross validation method to get the best model.

  • Model evaluation is done by R2 score and negative RMSE score.

  • Again we use sample of 10000 rows to compare models followed by full sample to train the final model.

models = {}
models['lr']= LinearRegression()
models['dr'] = DecisionTreeRegressor()
models['rf'] = RandomForestRegressor()
models['kn']=  KNeighborsRegressor()
models['ad'] = AdaBoostRegressor()
models['ex'] = ExtraTreesRegressor()
models['sv'] = SVR()


from sklearn import model_selection

for model in models:
  cv = sklearn.model_selection.RepeatedKFold(n_splits=100,n_repeats=1,random_state=1)
  n_scores = model_selection.cross_val_score(models[model],X,y,scoring='neg_root_mean_squared_error',cv=cv,n_jobs=-1)
  print(model, np.mean(n_scores),np.std(n_scores))

image

20.1.1 Features selection by two methods: RFE and SelectKBest

image

image

20.2 Price as target feature.

Regression Models comparison for prediciting price. Neg. RMSE score is used as evaluation metric.

image

optimum features selection for price prediction

image

image

In the we get R2 score of 1

image

and other score are given below

R-squared score (training): 1.000 R-squared score (test): 0.997

Test MSE: 0.002528635026436538 Test RMSE: 0.001264317513218269


21 Clustering Models

We will select the variable seasonName. Finding the best number of clusters using elbow method and silhouette score.

21.1 Finding best number of clusters

21.1.1 Silhoutte method

#Use silhouette score
range_n_clusters = list (range(2,10))
print ("Number of clusters from 2 to 9: \n", range_n_clusters)

fig, ax = plt.subplots(3, 2, figsize=(15,8))
for n_clusters in range_n_clusters:
    clusterer = MiniBatchKMeans(n_clusters=n_clusters)
    preds = clusterer.fit_predict(df_Ml)
    centers = clusterer.cluster_centers_

    q, mod = divmod(n_clusters, 2)

    score = silhouette_score(df_Ml, preds)
    print("For n_clusters = {}, silhouette score is {})".format(n_clusters, score))
    visualizer = SilhouetteVisualizer(clusterer, colors='yellowbrick', ax=ax[q-1][mod])
    visualizer.fit(df_Ml)

21.1.2 Elbow method

inertias = []
for n_clusters in range(2, 15):
 km = KMeans(n_clusters=n_clusters).fit(df_Ml)
 inertias.append(km.inertia_)
 
plt.plot(range(2, 15), inertias, ‘k’)
plt.title(“Inertia vs Number of Clusters”)
plt.xlabel(“Number of clusters”)
plt.ylabel(“Inertia”)

22 Clustering Models

22.1 MiniBatchK-Means Clustering

# define the model
model = MiniBatchKMeans(n_clusters=6)
# fit the model
model.fit(df_Ml)
# assign a cluster to each example
yhat = model.predict(df_Ml)
# retrieve unique clusters
clusters = unique(yhat)
# create scatter plot for samples from each cluster
for cluster in clusters:
    # get row indexes for samples with this cluster
    row_ix = where(yhat == cluster)
    # create scatter of these samples
    plt.scatter(df_Ml[row_ix, 0], df_Ml[row_ix, 1])
# show the plot
plt.show()

22.2 DBSCAN Clustering

# define the model
dbscan_model = DBSCAN(eps=0.25, min_samples=9)

# train the model
dbscan_model.fit(df_Ml)

# assign each data point to a cluster
dbscan_result = dbscan_model.predict(df_Ml)

# get all of the unique clusters
dbscan_cluster = unique(dbscan_result)

# plot the DBSCAN clusters
for dbscan_cluster in dbscan_clusters:
    # get data points that fall in this cluster
    index = where(dbscan_result == dbscan_clusters)
    # make the plot
    plt.scatter(df_Ml[index, 0], df_Ml[index, 1])

# show the DBSCAN plot
plt.show()

23 Deep Learning Models

Deep learning model for predicting price and comparison with regresssion models

We are using the sequential model with 4 fully-connected layers. ReLU is more popular in many deep neural networks, but I am using Tanh for activation on trial basis.

You almost never use Sigmoid because it is slow to train. We can add drop out layer to reduce overfitting

Adam loss function is used for model compilation.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Activation, Dropout
from tensorflow.keras.optimizers import Adam

X_train = np.array(X_train)
X_test = np.array(X_test)
y_train = np.array(y_train)
y_test = np.array(y_test)

model = Sequential()

model.add(Dense(X_train.shape[1], activation='relu'))
model.add(Dense(32, activation='Tanh'))
model.add(Dropout(0.2))

model.add(Dense(64, activation='Tanh'))
model.add(Dropout(0.2))

model.add(Dense(128, activation='Tanh'))
# model.add(Dropout(0.2))

model.add(Dense(512, activation='Tanh'))
model.add(Dropout(0.1))
model.add(Dense(1))

model.compile(optimizer=Adam(0.00001), loss='mse')

r = model.fit(X_train, y_train,
              validation_data=(X_test,y_test),
              batch_size=1,
              epochs=100)

24 Conclusion

Cambridge Industries makes a database of their sales. All the information related to customer choices such as loyalty card, product exchanges and type of product bought is available is the database. We have performed EDA analysis followed by data cleaning and feature engineering. Statistical aanalysis show that there is no significant difference between sales offline and online sales offered by the company. We have taken subset of data based on different products and locations. A map was shown where the products are bought and delivered. We have used different machine learning models to predict the price of the product such regression, classification, clustering and deep neural network with output features Sales (revenue of the company), inventory_status, SeasonName and Price respectively.